Sentimental Analysis Using Python
Today sentiments of a product user are very valuable for any company. Whether it is a sentiment expressed in customer feedback or a social media comment, businesses can easily find their grey area and can make data-driven decisions to enhance their products or services.
Today many political parties design their election campaigns on the basis of public sentiments expressed in the comment section of youtube or Instagram or Twitter. They can identify trending topics and even detect emerging issues.
Chat logs can be used to identify how happy or angry your customers are with your support or product. With this companies can detect frustration or dissatisfaction, and prioritize and address customer issues more effectively.
In this article, we are going to scrape reviews of a product from Amazon. We will analyze whether the reviews are happy or sad.
Setting up the prerequisites for scraping eBay
I hope you have already installed Python 3.x
on your machine and if not then please install it from here. We also need 3 III party libraries of Python.
- Requests– Using this library we will make an HTTP connection with the Amazon page. This library will help us to extract the raw HTML from the target page.
- BeautifulSoup– This is a powerful data parsing library. Using this we will extract necessary data out of the raw HTML we get using the requests library.
- vaderSentiment- VADER, also known as Valence Aware Dictionary and Sentiment Reasoner, is a sentiment analyzer based on rules, which has undergone training using text data from social media.
Before we install these libraries we will have to create a dedicated folder for our project. This is where we will keep our Python script.
mkdir amazonscraper
Now inside this folder, we can install the above-mentioned libraries.
pip install beautifulsoup4
pip install requests
pip install vaderSentiment
Create a Python file inside this folder where we will write the code. I am naming this file sentiment.py
.
Downloading html data from amazon.com
The very first step would be to make a GET request to the target page. For this tutorial, we are going to use this page from Amazon.
To make the GET request we are going to use the requests
library of Python.
import requests
from bs4 import BeautifulSoup
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
target_url = 'https://www.amazon.com/Apple-MacBook-Laptop-12%E2%80%91core-19%E2%80%91core/product-reviews/B0BSHF7WHW/ref=cm_cr_dp_d_show_all_btm'
headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}
resp = requests.get(target_url, verify=False, headers=headers)
print(resp.status_code)
print(resp.content)
Once you run this code the response will look like this.
We have successfully downloaded the raw HTML from the target page. Now, we have to decode what we need to parse from this data.
What are we going to scrape from Amazon?
It is always great to decide in advance what exactly we need from the page. From this page, we are going to parse out all 10 reviews.
Let’s parse these reviews using BeautifulSoup
Each review text is stored inside a span
tag with attribute name data-hook
and value as review-body
.
And all these span tags are stored inside a div tag with class reviews-content.
This can be easily analyzed by inspecting it from chrome devtools.
import requests
from bs4 import BeautifulSoup
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
l=[]
o={}
target_url = 'https://www.amazon.com/Apple-MacBook-Laptop-12%E2%80%91core-19%E2%80%91core/product-reviews/B0BSHF7WHW/ref=cm_cr_dp_d_show_all_btm'
headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}
resp = requests.get(target_url, verify=False, headers=headers)
soup = BeautifulSoup(resp.text, 'html.parser')
fulldivcontainer = soup.find_all("div",{"class":"reviews-content"})[1]
reviewdata = fulldivcontainer.find_all("span",{"data-hook":"review-body"})
for i in range(0,len(reviewdata)):
o["review{}".format(i+1)]=reviewdata[i].text
l.append(o)
print(l)
Let me explain this code by breaking it down for you.
- The code imports the necessary libraries:
requests
for sending HTTP requests to the target URL.BeautifulSoup
for parsing HTML and extracting data from it.SentimentIntensityAnalyzer
fromvaderSentiment.vaderSentiment
for sentiment analysis.
2. Two empty data structures are defined:
l
is an empty list.o
is an empty dictionary.
3. The target_url
variable holds the URL of the Amazon product reviews page that will be scraped.
4. The headers
variable contains a dictionary of HTTP headers. These headers are used in the request to mimic a web browser's user-agent, set the accept language, and specify the accepted response types.
5. The code sends a GET request to the specified target_url
using requests.get()
. The response is stored in the resp
variable.
6. The response content is parsed using BeautifulSoup
with the HTML parser specified as 'html.parser'
. The parsed result is stored in the soup
variable.
7. The code finds the specific div
elements with the class name "reviews-content"
within the parsed HTML using soup.find_all()
. The [1]
index is used to select the second matching element since the desired reviews are contained within that particular div
element. The result is stored in the fulldivcontainer
variable.
8. Within the fulldivcontainer
, the code finds all the span
elements with the attribute data-hook
set to "review-body"
using fulldivcontainer.find_all()
. The extracted review elements are stored in the reviewdata
variable.
9. A loop is set up to iterate over the reviewdata
elements. For each element, the review content is extracted using reviewdata[i].text
. The review content is assigned as a value to the o
dictionary with a key formatted as "review{}".format(i+1)
.
10. The o
dictionary is appended to the l
list.
11. Finally, the code prints the l
list, which contains all the extracted review data in the form of dictionaries.
Let’s do a sentiment analysis of each review
Finally, the time has arrived for us to use the vaderSentiment library.
You can start it by first creating the object of the SentimentIntensityAnalyzer
class. Then we can pass each review to the polarity_scores()
function of the object.
sentiment = SentimentIntensityAnalyzer()
for x in range(0,10):
sent = sentiment.polarity_scores(l[0]["review{}".format(x+1)])
print("Sentiment of review {}".format(x+1))
print(sent)
- The code initializes an instance of the
SentimentIntensityAnalyzer
class from thevaderSentiment.vaderSentiment
module. This analyzer is responsible for determining the sentiment intensity of text. - A loop is set up to iterate from 0 to 9 (inclusive), using the range function
range(0, 10)
. This means the loop will execute 10 times(we have only 10 reviews). - Within each iteration of the loop, the code retrieves the review text from the
l
list using the key"review{}".format(x+1)
. Thex+1
ensures that the review keys start from 1 instead of 0. - The
sentiment.polarity_scores()
method is called on the review text, which returns a dictionary containing sentiment scores. Thesentiment
object is an instance ofSentimentIntensityAnalyzer
initialized earlier. - The sentiment scores dictionary is assigned to the
sent
variable. - The code then prints the sentiment analysis results for each review. It displays the sentiment analysis score for the respective review by using the format method to incorporate the review number (
x+1
), and it prints the sentiment scores (sent
). - This process repeats for each review, providing sentiment analysis results for all 10 reviews.
Once you run the code you will get this.
'neg'
: This represents the negativity score for the given text.'neu'
: This represents the neutrality score for the given text.'pos'
: This represents the positivity score for the given text.'compound'
: This represents the compound score, which is a combination of the above three scores. It represents the overall sentiment polarity of the text.
Out of these 10 reviews, we can see that two reviews have negative compound values which indicated that review3 and review9 are both negative and rest are slightly positive or positive.
Complete Code
import requests
from bs4 import BeautifulSoup
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
l=[]
o={}
target_url = 'https://www.amazon.com/Apple-MacBook-Laptop-12%E2%80%91core-19%E2%80%91core/product-reviews/B0BSHF7WHW/ref=cm_cr_dp_d_show_all_btm'
headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}
resp = requests.get(target_url, verify=False, headers=headers)
soup = BeautifulSoup(resp.text, 'html.parser')
fulldivcontainer = soup.find_all("div",{"class":"reviews-content"})[1]
reviewdata = fulldivcontainer.find_all("span",{"data-hook":"review-body"})
for i in range(0,len(reviewdata)):
o["review{}".format(i+1)]=reviewdata[i].text
l.append(o)
sentiment = SentimentIntensityAnalyzer()
for x in range(0,10):
sent = sentiment.polarity_scores(l[0]["review{}".format(x+1)])
print("Sentiment of review {}".format(x+1))
print(sent)
Conclusion
In this tutorial, we saw how data can be scraped and analyzed. With various Python packages, this task was implemented with ease.
With growing economies, it becomes very crucial for companies to keep track of their reviews. One slight mistake can make their image negative.
I hope you like this little blog and if you do then do not forget to share this blog on your social account. You can follow us on Twitter for more such great content. Till then Happy Coding😀!