Sentimental Analysis Using Python

Scrapingdog
7 min readJun 17, 2023

--

Today sentiments of a product user are very valuable for any company. Whether it is a sentiment expressed in customer feedback or a social media comment, businesses can easily find their grey area and can make data-driven decisions to enhance their products or services.

Today many political parties design their election campaigns on the basis of public sentiments expressed in the comment section of youtube or Instagram or Twitter. They can identify trending topics and even detect emerging issues.

Chat logs can be used to identify how happy or angry your customers are with your support or product. With this companies can detect frustration or dissatisfaction, and prioritize and address customer issues more effectively.

In this article, we are going to scrape reviews of a product from Amazon. We will analyze whether the reviews are happy or sad.

Setting up the prerequisites for scraping eBay

I hope you have already installed Python 3.x on your machine and if not then please install it from here. We also need 3 III party libraries of Python.

  • Requests– Using this library we will make an HTTP connection with the Amazon page. This library will help us to extract the raw HTML from the target page.
  • BeautifulSoup– This is a powerful data parsing library. Using this we will extract necessary data out of the raw HTML we get using the requests library.
  • vaderSentiment- VADER, also known as Valence Aware Dictionary and Sentiment Reasoner, is a sentiment analyzer based on rules, which has undergone training using text data from social media.

Before we install these libraries we will have to create a dedicated folder for our project. This is where we will keep our Python script.

mkdir amazonscraper

Now inside this folder, we can install the above-mentioned libraries.

pip install beautifulsoup4
pip install requests
pip install vaderSentiment

Create a Python file inside this folder where we will write the code. I am naming this file sentiment.py.

Downloading html data from amazon.com

The very first step would be to make a GET request to the target page. For this tutorial, we are going to use this page from Amazon.

To make the GET request we are going to use the requests library of Python.

import requests
from bs4 import BeautifulSoup
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

target_url = 'https://www.amazon.com/Apple-MacBook-Laptop-12%E2%80%91core-19%E2%80%91core/product-reviews/B0BSHF7WHW/ref=cm_cr_dp_d_show_all_btm'
headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}
resp = requests.get(target_url, verify=False, headers=headers)

print(resp.status_code)

print(resp.content)

Once you run this code the response will look like this.

We have successfully downloaded the raw HTML from the target page. Now, we have to decode what we need to parse from this data.

What are we going to scrape from Amazon?

It is always great to decide in advance what exactly we need from the page. From this page, we are going to parse out all 10 reviews.

Let’s parse these reviews using BeautifulSoup

Each review text is stored inside a span tag with attribute name data-hook and value as review-body.

And all these span tags are stored inside a div tag with class reviews-content.

This can be easily analyzed by inspecting it from chrome devtools.

import requests
from bs4 import BeautifulSoup
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
l=[]
o={}
target_url = 'https://www.amazon.com/Apple-MacBook-Laptop-12%E2%80%91core-19%E2%80%91core/product-reviews/B0BSHF7WHW/ref=cm_cr_dp_d_show_all_btm'
headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}
resp = requests.get(target_url, verify=False, headers=headers)

soup = BeautifulSoup(resp.text, 'html.parser')

fulldivcontainer = soup.find_all("div",{"class":"reviews-content"})[1]

reviewdata = fulldivcontainer.find_all("span",{"data-hook":"review-body"})

for i in range(0,len(reviewdata)):
o["review{}".format(i+1)]=reviewdata[i].text


l.append(o)

print(l)

Let me explain this code by breaking it down for you.

  1. The code imports the necessary libraries:
  • requests for sending HTTP requests to the target URL.
  • BeautifulSoup for parsing HTML and extracting data from it.
  • SentimentIntensityAnalyzer from vaderSentiment.vaderSentiment for sentiment analysis.

2. Two empty data structures are defined:

  • l is an empty list.
  • o is an empty dictionary.

3. The target_url variable holds the URL of the Amazon product reviews page that will be scraped.

4. The headers variable contains a dictionary of HTTP headers. These headers are used in the request to mimic a web browser's user-agent, set the accept language, and specify the accepted response types.

5. The code sends a GET request to the specified target_url using requests.get(). The response is stored in the resp variable.

6. The response content is parsed using BeautifulSoup with the HTML parser specified as 'html.parser'. The parsed result is stored in the soup variable.

7. The code finds the specific div elements with the class name "reviews-content" within the parsed HTML using soup.find_all(). The [1] index is used to select the second matching element since the desired reviews are contained within that particular div element. The result is stored in the fulldivcontainer variable.

8. Within the fulldivcontainer, the code finds all the span elements with the attribute data-hook set to "review-body" using fulldivcontainer.find_all(). The extracted review elements are stored in the reviewdata variable.

9. A loop is set up to iterate over the reviewdata elements. For each element, the review content is extracted using reviewdata[i].text. The review content is assigned as a value to the o dictionary with a key formatted as "review{}".format(i+1).

10. The o dictionary is appended to the l list.

11. Finally, the code prints the l list, which contains all the extracted review data in the form of dictionaries.

Let’s do a sentiment analysis of each review

Finally, the time has arrived for us to use the vaderSentiment library.

You can start it by first creating the object of the SentimentIntensityAnalyzer class. Then we can pass each review to the polarity_scores() function of the object.

sentiment = SentimentIntensityAnalyzer()
for x in range(0,10):
sent = sentiment.polarity_scores(l[0]["review{}".format(x+1)])
print("Sentiment of review {}".format(x+1))
print(sent)
  1. The code initializes an instance of the SentimentIntensityAnalyzer class from the vaderSentiment.vaderSentiment module. This analyzer is responsible for determining the sentiment intensity of text.
  2. A loop is set up to iterate from 0 to 9 (inclusive), using the range function range(0, 10). This means the loop will execute 10 times(we have only 10 reviews).
  3. Within each iteration of the loop, the code retrieves the review text from the l list using the key "review{}".format(x+1). The x+1 ensures that the review keys start from 1 instead of 0.
  4. The sentiment.polarity_scores() method is called on the review text, which returns a dictionary containing sentiment scores. The sentiment object is an instance of SentimentIntensityAnalyzer initialized earlier.
  5. The sentiment scores dictionary is assigned to the sent variable.
  6. The code then prints the sentiment analysis results for each review. It displays the sentiment analysis score for the respective review by using the format method to incorporate the review number (x+1), and it prints the sentiment scores (sent).
  7. This process repeats for each review, providing sentiment analysis results for all 10 reviews.

Once you run the code you will get this.

  • 'neg': This represents the negativity score for the given text.
  • 'neu': This represents the neutrality score for the given text.
  • 'pos': This represents the positivity score for the given text.
  • 'compound': This represents the compound score, which is a combination of the above three scores. It represents the overall sentiment polarity of the text.

Out of these 10 reviews, we can see that two reviews have negative compound values which indicated that review3 and review9 are both negative and rest are slightly positive or positive.

Complete Code

import requests
from bs4 import BeautifulSoup
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
l=[]
o={}
target_url = 'https://www.amazon.com/Apple-MacBook-Laptop-12%E2%80%91core-19%E2%80%91core/product-reviews/B0BSHF7WHW/ref=cm_cr_dp_d_show_all_btm'
headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}
resp = requests.get(target_url, verify=False, headers=headers)

soup = BeautifulSoup(resp.text, 'html.parser')

fulldivcontainer = soup.find_all("div",{"class":"reviews-content"})[1]

reviewdata = fulldivcontainer.find_all("span",{"data-hook":"review-body"})

for i in range(0,len(reviewdata)):
o["review{}".format(i+1)]=reviewdata[i].text



l.append(o)
sentiment = SentimentIntensityAnalyzer()
for x in range(0,10):
sent = sentiment.polarity_scores(l[0]["review{}".format(x+1)])
print("Sentiment of review {}".format(x+1))
print(sent)

Conclusion

In this tutorial, we saw how data can be scraped and analyzed. With various Python packages, this task was implemented with ease.

With growing economies, it becomes very crucial for companies to keep track of their reviews. One slight mistake can make their image negative.

I hope you like this little blog and if you do then do not forget to share this blog on your social account. You can follow us on Twitter for more such great content. Till then Happy Coding😀!

--

--

Scrapingdog

I usually talk about web scraping and yes web scraping only. You can find a web scraping API at www.scrapingdog.com