In this article, we will learn how to scrape property listings from Booking.com using Kotlin. We will use Kotlin libraries like Ktor and kotlinx.html to fetch the HTML content and parse/extract details like property name, location, ratings etc.
Prerequisites
To follow along, you will need:
Adding Dependencies
We will use Ktor for sending HTTP requests and kotlinx.html for parsing HTML.
Add them to
dependencies {
implementation("io.ktor:ktor-client-core:1.2.6")
implementation("io.ktor:ktor-client-apache:1.2.6")
implementation("org.jetbrains.kotlinx:kotlinx-html-jvm:0.6.12")
}
Importing Libraries
Import the required classes and packages:
import io.ktor.client.*
import kotlinx.html.*
Defining URL
—
Define the target URL:
val url = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>"
Setting User Agent
Set the user agent string:
val userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
Fetching HTML Page
Use Ktor client to send GET request:
val client = HttpClient() {
engine {
customizeClient {
setUserAgent(userAgent)
}
}
}
val html = client.get<String>(url)
Configure client with user agent and make request.
Parsing HTML
Parse HTML using kotlinx.html:
val doc = Html.parse(html)
Extracting Cards
Get elements with
val cards = doc.getElementsByAttribute("data-testid", "property-card")
This extracts the property cards.
Processing Each Card
Loop through the extracted cards:
for (card in cards) {
// Extract data from card
}
Inside the loop we can extract details from each
Extracting Title
Get
val title = card.select("h3").text()
Extracting Location
Get address span text:
val location = card.select("span[data-testid=address]").text()
Extracting Rating
Get
val rating = card.select("div.e4755bbd60").attr("aria-label")
Filter by class name.
Extracting Review Count
Get div text:
val reviewCount = card.select("div.abf093bdfe").text()
Extracting Description
Get description div text:
val description = card.select("div.d7449d770c").text()
Printing Output
Print the extracted data:
print("Title: $title")
print("Location: $location")
print("Rating: $rating")
print("Review Count: $reviewCount")
print("Description: $description")
Full Script
Here is the complete Kotlin scraping script:
import io.ktor.client.*
import kotlinx.html.*
val url = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>"
val userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
val client = HttpClient() {
engine {
customizeClient {
setUserAgent(userAgent)
}
}
}
val html = client.get<String>(url)
val doc = Html.parse(html)
val cards = doc.getElementsByAttribute("data-testid", "property-card")
for (card in cards) {
val title = card.select("h3").text()
val location = card.select("span[data-testid=address]").text()
val rating = card.select("div.e4755bbd60").attr("aria-label")
val reviewCount = card.select("div.abf093bdfe").text()
val description = card.select("div.d7449d770c").text()
print("Title: $title")
print("Location: $location")
print("Rating: $rating")
print("Review Count: $reviewCount")
print("Description: $description")
}
This extracts key data from Booking.com listings using Kotlin. The same approach can be applied to scrape any site.
While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.
Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.
This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.
With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.