Scrape Wikipedia using Nodejs
Wikipedia is a data-rich website and contains a large amount of information. This data can be used to take appropriate decisions or you can use it to train bots or neural networks.

In this post, we are going to scrape Wikipedia using Nodejs. We are going to target this page from Wikipedia. You can also read Web Scraping with Nodejs if you are a beginner and want to learn how websites can be scraped using Node.js

Before starting with scraping you should visit this page to completely analyze the page design.
Setting up the prerequisites
Before we start coding we have to install certain libraries which are going to be used in the course of this article. I am assuming that you have already installed Node.js on your machine.
Before installing the libraries let’s create a folder where we will keep our scraping files.
mkdir wikipedia
Now, using npm install the required libraries.
npm install unirest
npm install cheerio
Unirest
: This library will be used to make a GET request to the host website.
Cheerio
— It will be used for parsing HTML.
Also, create a file where you write the code.
What are we going to extract?
We are going to extract titles and their explanations. It is always better to decide what you want to scrape before even writing a single line of code.

The title
is in red and the explanation
is in green color.
Scraping Wikipedia
Before we start writing the code, let’s find the location of titles and their explanation inside the DOM.

As, you can see all these explanations are under the p
tag.

All the headings are under h2
tag.
Let’s code this in node.js step by step.
- First step would be to import all the libraries that we have installed earlier.
const unirest = require('unirest');
const cheerio = require('cheerio');
This will import unirest
and cheerio
in our file.
2. We will make a GET request to our target page in order to get the HTML code from the page.
async function wikipediaScraper(){
let data = await unirest.get("https://en.wikipedia.org/wiki/Coronavirus").header("Accept", "text/html")
}
unirest.get
method will make HTTP connection with our target URL and .header
method will set Accept header to text/html.
3. We will load the raw html response using cheerio.
async function wikipediaScraper(){
let data = await unirest.get("https://en.wikipedia.org/wiki/Coronavirus").header("Accept", "text/html")
const $ = cheerio.load(data.body);
}
cheerio.load
method will load the HTML data into a cheerio object. This will help us to extract useful information from the raw HTML.
4. Finally we will extract the titles and their explanations paragraphs.
async function wikipediaScraper(){
let data = await unirest.get("https://en.wikipedia.org/wiki/Coronavirus").header("Accept", "text/html")
const $ = cheerio.load(data.body);
$("h2").each(function(i, elem) {
console.log($(elem).text());
console.log($(elem).nextUntil("h2").text().trim());
});
}
$(“h2”)
selector will select all the h2
tags in the html response and then using .each()
method we are going to iterate over them one by one.
.nextUntil()
will help us to extract the text until we find another h2
tag, following the current h2
tag. It means text available between two h2
tags.
We have used .text()
to get the text content of each and every heading.
5. Finally call this async function to execute this script.
async function wikipediaScraper(){
let data = await unirest.get("https://en.wikipedia.org/wiki/Coronavirus").header("Accept", "text/html")
const $ = cheerio.load(data.body);
$("h2").each(function(i, elem) {
console.log($(elem).text());
console.log($(elem).nextUntil("h2").text().trim());
});
}
wikipediaScraper()
The output of the program will look somewhat like this.

This program will print titles and paragraphs one by one.
Complete Code
In this article, you learned about how you can scrape Wikipedia using Node.js. You were introduced to libraries like cheerio
and unirest
. And finally, we wrote a code to extract data from Wikipedia.
Now, in this code, you can make further small changes to extract a little more data. You can separate h2 and h3 tags and many other things.
I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on your social media.
Additional Resources
Here are a few additional resources that you may find helpful during your web scraping journey: