In this article, we will see how to use C# and HtmlAgilityPack to scrape and extract data from Booking.com property listings.
Prerequisites
You will need:
Installing HtmlAgilityPack
Install the
Adding Namespaces
Add these namespaces:
using HtmlAgilityPack;
using System.Net;
Defining the URL
Define the target URL:
string url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2";
Downloading the Page HTML
Use
WebClient client = new WebClient();
string html = client.DownloadString(url);
Loading the HTML
Load the HTML into an
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
Selecting Property Cards
Use XPath to select the property cards:
var cards = document.DocumentNode.SelectNodes("//div[@data-testid='property-card']");
Looping Through Cards
Loop through the cards:
foreach (var card in cards)
{
// Extract data from card
}
Extracting Title
Get title element and its inner text:
var titleElement = card.SelectSingleNode(".//div[@data-testid='title']");
string title = titleElement.InnerText;
Extracting Location
Get location span and text:
var locationElement = card.SelectSingleNode(".//span[@data-testid='address']");
string location = locationElement.InnerText;
Extracting Rating
Get rating div's aria-label attribute value:
var ratingElement = card.SelectSingleNode(".//div[contains(@class, 'e4755bbd60')]");
string rating = ratingElement.GetAttributeValue("aria-label", "");
Extracting Review Count
Get review count div text:
var reviewCountElement = card.SelectSingleNode(".//div[contains(@class, 'abf093bdfe')]");
string reviewCount = reviewCountElement.InnerText;
Extracting Description
Get description div text:
var descriptionElement = card.SelectSingleNode(".//div[contains(@class, 'd7449d770c')]");
string description = descriptionElement.InnerText;
Printing the Data
Print out the extracted information:
Console.WriteLine("Title: " + title);
Console.WriteLine("Location: " + location);
Console.WriteLine("Rating: " + rating);
Console.WriteLine("Review Count: " + reviewCount);
Console.WriteLine("Description: " + description);
And that's how you can scrape data from Booking.com listings using C# and HtmlAgilityPack!
The same approach can be used to scrape any site.
Full code
using HtmlAgilityPack;
using System.Net;
class Program
{
static void Main(string[] args)
{
string url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2";
WebClient client = new WebClient();
string html = client.DownloadString(url);
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var cards = document.DocumentNode.SelectNodes("//div[@data-testid='property-card']");
foreach (var card in cards)
{
var titleElement = card.SelectSingleNode(".//div[@data-testid='title']");
string title = titleElement.InnerText;
var locationElement = card.SelectSingleNode(".//span[@data-testid='address']");
string location = locationElement.InnerText;
var ratingElement = card.SelectSingleNode(".//div[contains(@class, 'e4755bbd60')]");
string rating = ratingElement.GetAttributeValue("aria-label", "");
var reviewCountElement = card.SelectSingleNode(".//div[contains(@class, 'abf093bdfe')]");
string reviewCount = reviewCountElement.InnerText;
var descriptionElement = card.SelectSingleNode(".//div[contains(@class, 'd7449d770c')]");
string description = descriptionElement.InnerText;
Console.WriteLine("Title: " + title);
Console.WriteLine("Location: " + location);
Console.WriteLine("Rating: " + rating);
Console.WriteLine("Review Count: " + reviewCount);
Console.WriteLine("Description: " + description);
}
}
}
While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.
Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.
This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.
With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.