In this article, we will learn how to scrape property listings from Booking.com using Ruby. We will use the Nokogiri and OpenURI libraries to fetch the HTML content and then extract key information like property name, location, ratings, etc.
Prerequisites
To follow along, you will need:
Installing Dependencies
We need to install the Nokogiri and OpenURI libraries:
gem install nokogiri open_uri_redirections
This will download and install the latest versions.
Requiring Libraries
At the top of your ruby script, require the libraries:
require 'nokogiri'
require 'open-uri'
Defining the Target URL
—
Let's define the URL we want to scrape:
url = "<https://www.booking.com/searchresults.html?ss=New+York&>..."
We won't paste the full URL here.
Setting User Agent
We need to set a valid user agent header:
headers = { 'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...' }
This will make the request appear to come from a real browser.
Fetching the HTML Page
We can use
html = Nokogiri::HTML(open(url, headers))
This makes the request and parses the response into a Nokogiri document.
Extracting Property Cards
The property cards have a
property_cards = html.search('div[data-testid="property-card"]')
This finds all matching
Looping Through Cards
We can iterate through the cards:
property_cards.each do |card|
# Extract data from card
end
Inside this loop we will extract information from each card node.
Extracting Title
To get the title, we search for the
title = card.at('div[data-testid="title"]']&.text
We grab the text contents if the element is found.
Extracting Location
Similarly, the address is under a
location = card.at('span[data-testid="address"]']&.text
The pattern is the same for other fields.
Extracting Rating
The star rating
rating = card.at('div.e4755bbd60')['aria-label']
Here we get the
Extracting Review Count
The review count text is inside a
review_count = card.at('div.abf093bdfe')&.text
Extracting Description
The description is in a
description = card.at('div.d7449d770c')&.text
Printing the Data
Finally, we can print out the extracted data:
puts "Name: #{title}"
puts "Location: #{location}"
puts "Rating: #{rating}"
# etc...
And that covers scraping Booking.com property listings in Ruby! Let me know if you have any other questions.
Full Code
Here is the complete Ruby script:
require 'nokogiri'
require 'open-uri'
url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2"
headers = { 'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...' }
html = Nokogiri::HTML(open(url, headers))
property_cards = html.search('div[data-testid="property-card"]')
property_cards.each do |card|
title = card.at('div[data-testid="title"]']&.text
location = card.at('span[data-testid="address"]']&.text
rating = card.at('div.e4755bbd60')['aria-label']
review_count = card.at('div.abf093bdfe')&.text
description = card.at('div.d7449d770c')&.text
puts "Name: #{title}"
puts "Location: #{location}"
puts "Rating: #{rating}"
puts "Review Count: #{review_count}"
puts "Description: #{description}"
end
While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.
Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.
This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.
With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.