Scraping Booking.com Property Listings with CSharp in 2023

In this article, we will see how to use C# and HtmlAgilityPack to scrape and extract data from Booking.com property listings.

Prerequisites

You will need:

Visual Studio and .NET 6 or later

HtmlAgilityPack NuGet package

Installing HtmlAgilityPack

Install the HtmlAgilityPack NuGet package in your project.

Adding Namespaces

Add these namespaces:

using HtmlAgilityPack;
using System.Net;

Defining the URL

Define the target URL:

string url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2";

Downloading the Page HTML

Use WebClient to download the page HTML:

WebClient client = new WebClient();
string html = client.DownloadString(url);

Loading the HTML

Load the HTML into an HtmlDocument:

HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);

Selecting Property Cards

Use XPath to select the property cards:

var cards = document.DocumentNode.SelectNodes("//div[@data-testid='property-card']");

Looping Through Cards

Loop through the cards:

foreach (var card in cards)
{
  // Extract data from card
}

Extracting Title

Get title element and its inner text:

var titleElement = card.SelectSingleNode(".//div[@data-testid='title']");
string title = titleElement.InnerText;

Extracting Location

Get location span and text:

var locationElement = card.SelectSingleNode(".//span[@data-testid='address']");
string location = locationElement.InnerText;

Extracting Rating

Get rating div's aria-label attribute value:

var ratingElement = card.SelectSingleNode(".//div[contains(@class, 'e4755bbd60')]");
string rating = ratingElement.GetAttributeValue("aria-label", "");

Extracting Review Count

Get review count div text:

var reviewCountElement = card.SelectSingleNode(".//div[contains(@class, 'abf093bdfe')]");
string reviewCount = reviewCountElement.InnerText;

Extracting Description

Get description div text:

var descriptionElement = card.SelectSingleNode(".//div[contains(@class, 'd7449d770c')]");
string description = descriptionElement.InnerText;

Printing the Data

Print out the extracted information:

Console.WriteLine("Title: " + title);
Console.WriteLine("Location: " + location);
Console.WriteLine("Rating: " + rating);
Console.WriteLine("Review Count: " + reviewCount);
Console.WriteLine("Description: " + description);

And that's how you can scrape data from Booking.com listings using C# and HtmlAgilityPack!

The same approach can be used to scrape any site.

Full code

using HtmlAgilityPack;
using System.Net;

class Program
{
  static void Main(string[] args)
  {
    string url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2";

    WebClient client = new WebClient();
    string html = client.DownloadString(url);

    HtmlDocument document = new HtmlDocument();
    document.LoadHtml(html);

    var cards = document.DocumentNode.SelectNodes("//div[@data-testid='property-card']");

    foreach (var card in cards)
    {
      var titleElement = card.SelectSingleNode(".//div[@data-testid='title']");
      string title = titleElement.InnerText;

      var locationElement = card.SelectSingleNode(".//span[@data-testid='address']");
      string location = locationElement.InnerText;

      var ratingElement = card.SelectSingleNode(".//div[contains(@class, 'e4755bbd60')]");
      string rating = ratingElement.GetAttributeValue("aria-label", "");

      var reviewCountElement = card.SelectSingleNode(".//div[contains(@class, 'abf093bdfe')]");
      string reviewCount = reviewCountElement.InnerText;

      var descriptionElement = card.SelectSingleNode(".//div[contains(@class, 'd7449d770c')]");
      string description = descriptionElement.InnerText;

      Console.WriteLine("Title: " + title);
      Console.WriteLine("Location: " + location);
      Console.WriteLine("Rating: " + rating);
      Console.WriteLine("Review Count: " + reviewCount);
      Console.WriteLine("Description: " + description);
    }
  }
}

While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

Scraping Booking.com Property Listings with CSharp in 2023

Prerequisites

Installing HtmlAgilityPack

Adding Namespaces

Defining the URL

Downloading the Page HTML

Loading the HTML

Selecting Property Cards

Looping Through Cards

Extracting Title

Extracting Location

Extracting Rating

Extracting Review Count

Extracting Description

Printing the Data

Browse by language:

The easiest way to do Web Scraping

Scraping Booking.com Property Listings with CSharp in 2023

Prerequisites

Installing HtmlAgilityPack

Adding Namespaces

Defining the URL

Downloading the Page HTML

Loading the HTML

Selecting Property Cards

Looping Through Cards

Extracting Title

Extracting Location

Extracting Rating

Extracting Review Count

Extracting Description

Printing the Data

The easiest way to do Web Scraping

Don't leave just yet!