How to track rank of any Keyword
Keyword rank tracking is very common in the marketing world. Many marketing teams use expensive tools to track their website ranks for multiple keywords on a regular basis. Since we have to do it on daily basis this comes quite costly for new businesses or individuals. So in this post, we will create a crawler that will keep you updated with your latest rank on any particular keyword.
We will create a web scraper for google search results using python. I am assuming that you have already installed python on your computer. We will begin with coding the web scraper.
Let’s code
First, we need to install all the necessary libraries.
Create a folder and then install these libraries.
>> mkdir googlescraper>> pip install requests>> pip install beautifulsoup4
Then we will import these libraries into our file. You can name the file googlescraper.py.
import requests
from bs4 import BeautifulSoup
Our target URL will change according to the keyword we want to scrape but the basic structure of the google URL will remain the same.
Google URL structure — https://www.google.com/search?q={any keyword or phrase}
For this blog post, our target keyword will be “scrape prices” and we have to find the rank of the domain christian-schou.dk at this keyword.
So, our target URL will be this.
Let us first check whether this domain is present in the first 10 results or not.
As you can see page URLs are located inside class jGGQ5e and then into yuRUbf. After this, we have to find a tag inside class yuRUbf and then get the value of href tag.
headers={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}target_url=’https://www.google.com/search?q=scrape+prices'resp = requests.get(target_url, headers=headers)print(resp.status_code)
Here we have declared some headers like User-Agent and a Referer to act like a normal browser and not as a crawler. Then we declared our target URL and finally made the GET request using the requests library. Once we run this code you should see 200 on your terminal.
Now, our target is to find our domain. Let’s find it using BS4.
soup=BeautifulSoup(resp.text,’html.parser’)results = soup.find_all(“div”,{“class”:”jGGQ5e”})
We have used html.parser inside the BS4 library to create a tree of our HTML code. results array, you will get the HTML code of all the top 10 results. In this list, we have to search our links one by one. For that, we are going to use for loop.
from urllib.parse import urlparsefor x in range(0,len(results)):
domain=urlparse(results[x].find("div",{"class":"yuRUbf"}).find("a").get("href")).netloc
if(domain == 'blog.christian-schou.dk'):
found=True
position=x+1
break;
else:
found=Falseif(found==True):
print("Found at position", position)
else:
print("not found in top", len(results))
We have used urlparse library to parse out the domain from the link. Then we are trying to match our domain with the domain we extracted. If it matches we will get the position and if it does not match then it will print not found.
Let us run this code and let’s see what we get.
Well, the request was successful as I can see a 200 but we could find this domain in the top 10 results. Let’s search for it in the top 20 results, but for that, we need to change the target URL and add param &num=20 to our google URL.
Google URL will become https://www.google.com/search?q=scrape+prices&num=20
Run the program again and check whether you see this domain or not.
This time I found the domain in the 18th position on Google search results. So, the rank of this domain for “scrape prices” is 18th in my country. This position will change according to the country as google display different results in a different country.
This is how you can track the rank of any domain for any keyword. If you want to track it for different countries then you can use google search result scraper.
Going forward you can also create an SEO tool just like Ahref and Semrush or you can create a lead generation tool like Snov.
Complete Code
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparseheaders={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}
target_url=’https://www.google.com/search?q=scrape+prices&num=20'resp = requests.get(target_url, headers=headers)print(resp.status_code)soup=BeautifulSoup(resp.text,’html.parser’)results = soup.find_all(“div”,{“class”:”jGGQ5e”})
# print(results)
for x in range(0,len(results)):
domain=urlparse(results[x].find(“div”,{“class”:”yuRUbf”}).find(“a”).get(“href”)).netloc
if(domain == ‘blog.christian-schou.dk’):
found=True
position=x+1
break;
else:
found=Falseif(found==True):
print(“Found at position”, position)
else:
print(“not found in top”, len(results))
Running the code every 24 hours
Let’s say you want to track your position every 24 hours because you are putting lots of effort into marketing and you want to see results. For that you can mail yourself the current position every morning, this will keep you updated.
We will use schedule library to implement this task.
Complete Code
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
import schedule
import timedef tracker():
headers={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}
target_url=’https://www.google.com/search?q=scrape+prices&num=20' resp = requests.get(target_url, headers=headers) print(resp.status_code) soup=BeautifulSoup(resp.text,’html.parser’) results = soup.find_all(“div”,{“class”:”jGGQ5e”})
# print(results)
for x in range(0,len(results)):
domain=urlparse(results[x].find(“div”,{“class”:”yuRUbf”}).find(“a”).get(“href”)).netloc if(domain == ‘blog.christian-schou.dk’):
found=True
position=x+1
break;
else:
found=False
position=x+1if(found==True):
print(“Found at position”, position)
else:
print(“not found in top “+ str(position)+ “ results”)if __name__ == “__main__”:
schedule.every(5).seconds.do(tracker) while True:
schedule.run_pending()
Here we are running the schdule every 5 seconds just to test whether it will work for us or not. Once you run it you will get the results like this.
Now, to run it every day or after every 24 hours you can use:
schedule.every().day.at("12:00").do(job)
Now, let us mail ourself these results to keep ourself updated with latest position on google. For this task we will use smtplib library.
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
import schedule
import time
import smtplib, ssldef mail(position):
attackMsg = position
server = smtplib.SMTP(‘smtp.gmail.com’, 587) server.ehlo()
server.starttls() server.login(“from@gmail.com”, “xxxx”)
SUBJECT = “Position Alert”
message = ‘From: from@gmail.com \nSubject: {}\n\n{}’.format(SUBJECT, attackMsg) server.sendmail(“from@gmail.com”, ‘send_to@gmail.com’, message)
server.quit() return Truedef tracker():
headers={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}
target_url=’https://www.google.com/search?q=scrape+prices&num=20' resp = requests.get(target_url, headers=headers) print(resp.status_code) soup=BeautifulSoup(resp.text,’html.parser’) results = soup.find_all(“div”,{“class”:”jGGQ5e”})
# print(results)
for x in range(0,len(results)):
domain=urlparse(results[x].find(“div”,{“class”:”yuRUbf”}).find(“a”).get(“href”)).netloc if(domain == ‘blog.christian-schou.dk’):
found=True
position=x+1
break;
else:
found=False
position=x+1if(found==True):
message=”Found at position “+ str(position)
mail(message)
else:
message=”not found in top “+ str(position)+ “ results”
mail(message)if __name__ == “__main__”:
schedule.every().day.at("12:00").do(job) while True:
schedule.run_pending()
In mail function we are making a login attempt to our Gmail account with the password. Then we have declared subject and the message that will be sent to us. Then finally we have used .senemail function to send the email alert. This will send an email alert every 24 hours directly to your inbox.
Now, you might be wondering that if we stop the script our scheduler will stop working. Yes, you are right and to tackle it we are going to use nohup.
Nohup will ignore the hangup signal and will keep running your script even if you stop it.
I leave this task on you as a homework in a hope that you will learn something new and unique.
Conclusion
In this post we learned how we can create a task which can run at any given interval of time. We used four libraries i.e. requests, BS4, schdule and smtplib to complete this task. Now it does not stop here, you can create any type of scheduler like news updates, stock updates, etc. I am sure python will make your job fast and simple.
If you have any questions you can contact me from my contact form.