Web Scraping Netflix with Python

Scrapingdog
5 min readNov 8, 2022

--

As we all know Netflix is an OTT platform where you can watch unlimited Shows and movies. Literally UNLIMITED! You can scrape Netflix to collect any episode's names, cast, ratings, similar shows, pricing of plans, etc. Using this data you can analyze what users are watching these days, this helps in sentiment analysis too.

I will be using Python for scraping Netflix. I am assuming you have already installed python on your computer. Ok, I think we have discussed enough Let’s begin with scraping now!

Scrape Netflix

To begin with, we will create a folder and install all the libraries we might need during the course of this tutorial.

For now, we will install two libraries

  1. Requests will help us to make an HTTP connection with Netflix.com.
  2. BeautifulSoup will help us to create an HTML tree for smooth data extraction.
>> mkdir netflix
>> pip install requests
>> pip install beautifulsoup4

Inside this folder, you can create a python file where we will write our code. We will scrape this Netflix page. Our data of interest will be:

  1. Name of the show
  2. The number of seasons.
  3. What is it about?
  4. Episode Names
  5. Episode overview.
  6. Genre
  7. Show Category
  8. Social media links
  9. Cast

I know it’s a long list of data but in the end, you will have a ready code for scraping any page from Netflix, not just this page.

Let’s find the location of each of these elements

The title is stored under the h1 tag of class title-title.

The number of seasons is stored under the span tag of the duration class.

The about section is stored under the div tag of the class hook-text.

The episode title is stored under the h3 tag with the class episode-title.

The episode title is stored under the p tag with the class episode-synopsis.

Genre is stored under span tag with the class item-genres.

The category of the show is stored under the span tag with the class item-mood-tag.

Social Media links can be found under a tag with class name social-link.

The cast is stored under the span tag with class item-cast.

Let’s start with making a normal GET request to the target webpage and see what happens.

import requests
from bs4 import BeautifulSoup
target_url="https://www.netflix.com/in/title/80057281"
resp = requests.get(target_url)
print(resp.status_code)

If you get 200 then you have successfully scraped our target page. Now, let’s extract information from this data using BeautifulSoup or BS4.

soup=BeautifulSoup(resp.text, 'html.parser')
l=list()
o={}
e={}
d={}
m={}
c={}

Let us first extract all the data properties one by one. As discussed above we will be using the same HTML location.

o["name"]=soup.find("h1", {"class":"title-title"}).texto["seasons"] = soup.find("span", {"class":"duration"}).texto["about"] = soup.find("div", {"class":"hook-text"}).text

Now, let’s extract the episode details.

episodes = soup.find("ol",{"class":"episodes-container"}).find_all("li")for i in range(0,len(episodes)):
e["episode-title"]=episodes[i].find("h3",{"class":"episode-title"}).text
e["episode-description"]=episodes[i].find("p",{"class":"epsiode-synopsis"}).text
l.append(e)
e={}

Complete data is inside ol tag. So, we first find the ol tag and then all the li tags inside it. Then we have used for loop to extract the title and the description.

Let’s extract genre now.

genres = soup.find_all("span",{"class":"item-genres"})
for x in range(0,len(genres)):
d["genre"]=genres[x].text.replace(",","")
l.append(d)
d={}

Genre can be found under class item-genre. Again we have used for loop to extract all the genres.

Let’s extract rest of the data properties with similar technique.

mood = soup.find_all("span",{"class":"item-mood-tag"})
for y in range(0,len(mood)):
m["mood"]=mood[y].text.replace(",","")
l.append(m)
m={}
o["facebook"]=soup.find("a",{"data-uia":"social-link-facebook"}).get("href")o["twitter"]=soup.find("a",{"data-uia":"social-link-twitter"}).get("href")o["instagram"]=soup.find("a",{"data-uia":"social-link-instagram"}).get("href")cast=soup.find_all("span",{"class":"item-cast"})
for t in range(0,len(cast)):
c["cast"]=cast[t].text
l.append(c)
c={}
l.append(o)
print(l)

We have managed to scrape all the data from Netflix.

Complete Code

With this code we have managed to scrape Name, Number of seasons, What is the show is about, Cast, genre, Mood, Social links, etc. With just little more changes to this code you can exract a lot more data from Netflix.

import requests
from bs4 import BeautifulSoup
l=list()
o={}
e={}
d={}
m={}
c={}
target_url="https://www.netflix.com/in/title/80057281"
resp = requests.get(target_url)
soup = BeautifulSoup(resp.text, 'html.parser')o["name"]=soup.find("h1", {"class":"title-title"}).texto["seasons"] = soup.find("span", {"class":"duration"}).texto["about"] = soup.find("div", {"class":"hook-text"}).textepisodes = soup.find("ol",{"class":"episodes-container"}).find_all("li")for i in range(0,len(episodes)):
e["episode-title"]=episodes[i].find("h3",{"class":"episode-title"}).text
e["episode-description"]=episodes[i].find("p",{"class":"epsiode-synopsis"}).text
l.append(e)
e={}
genres = soup.find_all("span",{"class":"item-genres"})
for x in range(0,len(genres)):
d["genre"]=genres[x].text.replace(",","")
l.append(d)
d={}
mood = soup.find_all("span",{"class":"item-mood-tag"})
for y in range(0,len(mood)):
m["mood"]=mood[y].text.replace(",","")
l.append(m)
m={}
o["facebook"]=soup.find("a",{"data-uia":"social-link-facebook"}).get("href")
o["twitter"]=soup.find("a",{"data-uia":"social-link-twitter"}).get("href")
o["instagram"]=soup.find("a",{"data-uia":"social-link-instagram"}).get("href")
cast=soup.find_all("span",{"class":"item-cast"})
for t in range(0,len(cast)):
c["cast"]=cast[t].text
l.append(c)
c={}
l.append(o)
print(l)

Conclusion

This was just a quick way to crawl complete Netflix page. By changing the show title ID you can scrape almost all the shows from Netflix. You just need to have the IDs of those shows. In place of BS4 you can also use Xpath to create HTML tree for data extraction. You can use Web Scraping API to extract data from Netflix at scale without getting blocked.

I hope you liked this quick tutorial on scraping Netflix and if you does please share this blog on your social networks. Let us know if you need help with any web scraping demands.

Additional Resources

Additional Resources

--

--

Scrapingdog

I usually talk about web scraping and yes web scraping only. You can find a web scraping API at www.scrapingdog.com