Scraping Booking.com Property Listings in Java

In this article, we will learn how to scrape property listings from Booking.com using Java. We will use libraries like JSoup and HttpClient to fetch the HTML content and parse/extract details like property name, location, ratings etc.

Prerequisites

To follow along, you will need:

JDK 8+ installed

Basic Java and HTML knowledge

Adding Dependencies

Add JSoup and HttpClient Maven dependencies:

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.13.1</version>
</dependency>

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.5.13</version>
</dependency>

Importing Libraries

Import required classes and packages:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;

Defining URL

—

Define target URL:

String url = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>";

Setting User Agent

Set user agent string:

String userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36";

Fetching HTML Page

Send HTTP GET request using HttpClient:

CloseableHttpClient client = HttpClients.createDefault();
HttpGet request = new HttpGet(url);
request.setHeader("User-Agent", userAgent);

CloseableHttpResponse response = client.execute(request);

Document html = Jsoup.parse(response.getEntity().getContent(), "UTF-8");

We set the user agent and parse response HTML.

Parsing HTML

The HTML is loaded into a JSoup Document.

Extracting Cards

Get elements with data-testid attribute:

Elements cards = html.select("div[data-testid=property-card]");

This extracts all the property cards.

Processing Each Card

Loop through the extracted cards:

for (Element card : cards) {

  // Extract data from card

}

Inside we can extract details from each card.

Extracting Title

Get h3 text:

String title = card.select("h3").text();

Extracting Location

Get address span text:

String location = card.select("span[data-testid=address]").text();

Extracting Rating

Get aria-label attribute value:

String rating = card.select("div.e4755bbd60").attr("aria-label");

Filter by class name.

Extracting Review Count

Get div text:

String reviewCount = card.select("div.abf093bdfe").text();

Extracting Description

Get description div text:

String description = card.select("div.d7449d770c").text();

Printing Output

Print extracted data:

System.out.println("Title: " + title);
System.out.println("Location: " + location);
System.out.println("Rating: " + rating);
System.out.println("Review Count: " + reviewCount);
System.out.println("Description: " + description);

Full Scraper Code

Here is the complete Java code to scrape Booking.com:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;

public class BookingScraper {

  public static void main(String[] args) throws Exception {

    String url = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>";

    String userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36";

    CloseableHttpClient client = HttpClients.createDefault();
    HttpGet request = new HttpGet(url);
    request.setHeader("User-Agent", userAgent);

    CloseableHttpResponse response = client.execute(request);

    Document html = Jsoup.parse(response.getEntity().getContent(), "UTF-8");

    Elements cards = html.select("div[data-testid=property-card]");

    for (Element card : cards) {

      String title = card.select("h3").text();
      String location = card.select("span[data-testid=address]").text();
      String rating = card.select("div.e4755bbd60").attr("aria-label");
      String reviewCount = card.select("div.abf093bdfe").text();
      String description = card.select("div.d7449d770c").text();

      System.out.println("Title: " + title);
      System.out.println("Location: " + location);
      System.out.println("Rating: " + rating);
      System.out.println("Review Count: " + reviewCount);
      System.out.println("Description: " + description);

    }

  }

}

This extracts key data from Booking.com listings using JSoup and HttpClient in Java. The same approach can be applied to any site.

While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

Scraping Booking.com Property Listings in Java

Prerequisites

Adding Dependencies

Importing Libraries

Defining URL

Setting User Agent

Fetching HTML Page

Parsing HTML

Extracting Cards

Processing Each Card

Extracting Title

Extracting Location

Extracting Rating

Extracting Review Count

Extracting Description

Printing Output

Full Scraper Code

Browse by language:

The easiest way to do Web Scraping

Scraping Booking.com Property Listings in Java

Prerequisites

Adding Dependencies

Importing Libraries

Defining URL

Setting User Agent

Fetching HTML Page

Parsing HTML

Extracting Cards

Processing Each Card

Extracting Title

Extracting Location

Extracting Rating

Extracting Review Count

Extracting Description

Printing Output

Full Scraper Code

The easiest way to do Web Scraping

Don't leave just yet!