Scraping Yelp Data using Python (A Comprehensive Guide-DUP)

Scrapingdog
5 min readJan 17, 2024

--

In this tutorial, we will scrape Yelp and build our Yelp scraper using Python. We’re going to use the power of this programming language to extract valuable insights from Yelp’s rich and extensive database.

Whether you’re a budding data scientist, a curious programmer, or a business analyst seeking novel ways to obtain data, this guide will help you unravel the potential of web scraping Yelp.

From collecting customer reviews to analyzing business ratings, the opportunities are vast. So, let’s embark on this journey, turning unstructured data into meaningful insights, one scrape at a time.

To make things simple, we will use Scrapingdog’s scraping API.

Why Scrape Yelp Data?

Yelp is an American company that publishes reviews about businesses. The reviews they collect are crowd-sourced. It is the largest directory on the Internet available.

Scraping Yelp data & designing a Yelp data scraper will provide you with a large number of data trends and information. Using this data you can either improve your product or you can show it to your other free clients to convert them to your paid client.

Since Yelp is a business directory it has many businesses listed that can be in your target market. Scraping Yelp data allows you to extract valuable information like business names, contact information, location, and industry to help you create qualified leads a lot faster with a web scraper.

Read More: Web Scraping Yellow Pages Data for Phone Numbers, Email & Address Using Python!!

Requirements For Scraping Yelp Data

Generally, web scraping is divided into two parts:

  1. Fetching data by making an HTTP request
  2. Extracting important data by parsing the HTML DOM

Libraries & Tools

  1. Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  2. Requests allow you to send HTTP requests very easily.
  3. Web scraping API extracts the HTML code of the target URL.

Know more: Learn Web Scraping 101 with Python!!

Setup

Our setup is pretty simple. Just create a folder and install BeautifulSoup & requests. For creating a folder and installing libraries, type the below-given commands. I assume that you have already installed Python 3. x (The latest version is 3.9 as of April 2022).

mkdir scraper
pip install beautifulsoup4
pip install requests

Now, create a file inside that folder by any name you like. I am using scraping.py.

from bs4 import BeautifulSoup
import requests

Let’s Start Scraping Yelp Reviews for a Random Restaurant

We are going to scrape data from this restaurant.

We will extract the following information from our target page.

  1. Name of the Restaurant
  2. Address of the Restaurant
  3. Rating
  4. Phone number

Let’s Start Scraping Yelp Review Data

Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the target URL to get the raw HTML data.

We will scrape Yelp data using the requests library below.

from bs4 import BeautifulSoup
import requests

l={}
u=[]

r = requests.get('https://www.yelp.com/biz/sushi-yasaka-new-york').text

This will provide you with an HTML code of that target URL.

Parsing the raw HTML

Now we will use BS4 to extract the information we need. But before this, we have to find the DOM location of each data element. We will take advantage of Chrome developer tools to find the location.

Let’s start with the name first.

So, the name is located inside the h1 tag with the class css-1se8maq.

Similarly, the address can be found inside the p tag with the class css-qyp8bo.

The star rating can be found in the div tag with the class css-1v6kfrx. Inside this class, there is an attribute aria-label inside which this star rating is hidden.

The phone number is located inside the second div tag with the class css-djo2w.

Now, we have the location of each data point we want to extract from the target page. Let’s now use BS4 to parse this information.

soup = BeautifulSoup(r,'html.parser')

Here we have created a beautifulSoup object.

try:
l["name"]=soup.find("h1",{"class":"css-1se8maq"}).text
except:
l["name"]=None
try:
l["address"]=soup.find("p",{"class":"css-qyp8bo"}).text
except:
l["address"]=None
try:
l["stars"]=soup.find("div",{"class":"css-1v6kfrx"}).get('aria-label')
except:
l["stars"]=None
try:
l["phone"]=soup.find_all("div",{"class":"css-djo2w"})[1].text.replace("Phone number","")
except:
l["phone"]=None


u.append(l)
l={}
print({"data":u})

Once you run the above code you will get this output on your console.

{'data': [{'name': 'Sushi Yasaka', 'address': '251 W 72nd St New York, NY 10023', 'stars': '4.2 star rating', 'phone': '(212) 496-8460'}]}

There you go!

We have the Yelp data ready to manipulate and maybe store somewhere like in MongoDB. But that is out of the scope of this tutorial.

Complete Code

You can scrape other information like reviews, website addresses, etc from the raw HTML we downloaded in the first step. But for now, the code will look like this.

from bs4 import BeautifulSoup
import requests
l={}
u=[]
r = requests.get('https://www.yelp.com/biz/sushi-yasaka-new-york').text

soup = BeautifulSoup(r,'html.parser')



try:
l["name"]=soup.find("h1",{"class":"css-1se8maq"}).text
except:
l["name"]=None
try:
l["address"]=soup.find("p",{"class":"css-qyp8bo"}).text
except:
l["address"]=None
try:
l["stars"]=soup.find("div",{"class":"css-1v6kfrx"}).get('aria-label')
except:
l["stars"]=None
try:
l["phone"]=soup.find_all("div",{"class":"css-djo2w"})[1].text.replace("Phone number","")
except:
l["phone"]=None

u.append(l)
l={}
print({"data":u})

How to scrape Yelp without getting blocked?

Scrapingdog’s API for web scraping can help you extract data from Yelp at scale without getting blocked. You just have to pass the target url and Scrapingdog will create an unbroken data pipeline for you, that too without any blockage.

Once you sign up you will get an API key on your dashboard.

You have to use this API key the below provided code.

from bs4 import BeautifulSoup
import requests
l={}
u=[]
r = requests.get('https://api.scrapingdog.com/scrape?dynamic=false&api_key=Your-API-key&url=https://www.yelp.com/biz/sushi-yasaka-new-york').text

soup = BeautifulSoup(r,'html.parser')



try:
l["name"]=soup.find("h1",{"class":"css-1se8maq"}).text
except:
l["name"]=None
try:
l["address"]=soup.find("p",{"class":"css-qyp8bo"}).text
except:
l["address"]=None
try:
l["stars"]=soup.find("div",{"class":"css-1v6kfrx"}).get('aria-label')
except:
l["stars"]=None
try:
l["phone"]=soup.find_all("div",{"class":"css-djo2w"})[1].text.replace("Phone number","")
except:
l["phone"]=None

u.append(l)
l={}
print({"data":u})

As you can see the code is the same except the target url. With the help of Scrapingdog, you can scrape endless data from Yelp.

--

--

Scrapingdog

I usually talk about web scraping and yes web scraping only. You can find a web scraping API at www.scrapingdog.com