In this article, we will learn how to scrape property listings from Booking.com using Python. We will use the requests and Beautiful Soup libraries to fetch the HTML content and then extract key information like property name, location, ratings, etc.
Prerequisites
To follow along, you will need:
Installing Dependencies
We will use two Python packages -
To install them:
pip install requests beautifulsoup4
This will download and install the latest versions of these libraries.
Importing Modules
At the top of your script, import the necessary modules:
import requests
from bs4 import BeautifulSoup
Defining the Target URL
We will scrape property listings from this URL on Booking.com:
url = "<https://www.booking.com/searchresults.en-gb.html?ss=Manhattan%2C+New+York%2C+New+York+State%2C+United+States&map=1&efdco=1&label=gen173nr-1BCAEoggI46AdIM1gEaGyIAQGYAQm4AQfIAQzYAQHoAQGIAgGoAgO4AoH47qgGwAIB0gIkNTU1NTYzMjUtMzFhMS00NzZhLTg5YTQtZGFhOTNiZDRlMDI32AIF4AIB&sid=10b122fa3c8cf51d65c8a9e1ceb2fbcf&aid=304142&lang=en-gb&sb=1&src_elem=sb&src=index&dest_id=929&dest_type=district&ac_position=1&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=7ae531418d840184&ac_meta=GhA3YWU1MzE0MThkODQwMTg0IAEoATICZW46Bm5ldyB5b0AASgBQAA%3D%3D&checkin=2023-11-08&checkout=2023-11-11&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure#map_closed>"
This URL fetches property listings for Manhattan, New York for given check-in and check-out dates.
You can modify the parameters like location, dates, number of people etc to fetch listings as per your need.
Setting User-Agent Header
Many websites block requests without a valid User-Agent string. So we need to set it to mimic a real browser:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
}
Replace this with any modern browser's user agent string.
Fetching the HTML Page
With the URL and headers defined, we can use the
response = requests.get(url, headers=headers)
This sends a GET request to the URL and stores the response in the
We should check that the request was successful before proceeding:
if response.status_code == 200:
# Success!
# Parse HTML here
else:
print("Failed to fetch webpage")
A status code of 200 means the request was successful. Any other code means an error occurred.
Parsing the HTML
With the HTML content fetched, we can use Beautiful Soup to parse and extract information from it:
soup = BeautifulSoup(response.text, "html.parser")
This parses the
We now have a
Extracting Property Cards
The property listings on Booking.com are contained in
We can use Beautiful Soup's
property_cards = soup.find_all("div", {"data-testid": "property-card"})
This gives us a list of all property cards on the page.
Looping Through Property Cards
With the list of cards, we can loop through each one to extract information:
for card in property_cards:
# Extract information from card here
Inside this loop, we will locate specific elements and extract their text.
Saving the Raw HTML
It's useful to save the raw HTML of each card before we start extracting text. This serves as a backup if we ever need to re-extract information.
filename = f"property_card.html"
card_html = str(card)
with open(filename, "w", encoding="utf-8") as file:
file.write(card_html)
This converts the BeautifulSoup
The filename is dynamically generated for each card.
Extracting Property Name
To extract the property name, we find the
title_element = card.find("div", {"data-testid": "title"})
if title_element is not None:
property_name = title_element.text.strip()
else:
property_name = "Title not found"
We first try to find the element. If found, we take its
If the title element doesn't exist, we set a default value.
Extracting Location
The location string is stored in a
property_location = card.find("span", {"data-testid": "address"}).text.strip()
Similar to before, we find the element and extract the text inside it.
Extracting Star Rating
The star rating element has a class name we can search for:
star_rating_div = card.find("div", {"class": "e4755bbd60"})
if star_rating_div:
star_rating = star_rating_div["aria-label"]
else:
star_rating = "Star rating not found"
If the
Extracting Review Count
The review count is stored in a
review_count_div = card.find("div", {"class": "abf093bdfe"})
if review_count_div:
review_count = review_count_div.text.strip()
else:
review_count = "Review count not available"
Again, we try to extract the text if the element is found, else set a default value.
Extracting Description
The property description is contained in a
description = card.find("div", {"class": "d7449d770c"}).text.strip()
The rest of the logic is similar to what we've seen before.
Printing the Extracted Data
Finally, we can print the extracted information from each card:
print("Name:", property_name)
print("Location:", property_location)
print("Rating:", star_rating)
print("Review Count:", review_count)
print("Description:", description)
print("-" * 40) # separator
This will nicely print out the key details for each property listing.
You can also store this data in a dictionary or other data structure instead of printing.
Full Code
For your reference, here is the full web scraping script:
import requests
from bs4 import BeautifulSoup
url = "<https://www.booking.com/searchresults.en-gb.html?ss=Manhattan%2C+New+York%2C+New+York+State%2C+United+States&map=1&efdco=1&label=gen173nr-1BCAEoggI46AdIM1gEaGyIAQGYAQm4AQfIAQzYAQHoAQGIAgGoAgO4AoH47qgGwAIB0gIkNTU1NTYzMjUtMzFhMS00NzZhLTg5YTQtZGFhOTNiZDRlMDI32AIF4AIB&sid=10b122fa3c8cf51d65c8a9e1ceb2fbcf&aid=304142&lang=en-gb&sb=1&src_elem=sb&src=index&dest_id=929&dest_type=district&ac_position=1&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=7ae531418d840184&ac_meta=GhA3YWU1MzE0MThkODQwMTg0IAEoATICZW46Bm5ldyB5b0AASgBQAA%3D%3D&checkin=2023-11-08&checkout=2023-11-11&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure#map_closed>"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
property_cards = soup.find_all("div", {"data-testid": "property-card"})
for card in property_cards:
filename = f"property_card.html"
card_html = str(card)
with open(filename, "w", encoding="utf-8") as file:
file.write(card_html)
title_element = card.find("div", {"data-testid": "title"})
if title_element is not None:
property_name = title_element.text.strip()
else:
property_name = "Title not found"
property_location = card.find("span", {"data-testid": "address"}).text.strip()
star_rating_div = card.find("div", {"class": "e4755bbd60"})
if star_rating_div:
star_rating = star_rating_div["aria-label"]
else:
star_rating = "Star rating not found"
review_count_div = card.find("div", {"class": "abf093bdfe"})
if review_count_div:
review_count = review_count_div.text.strip()
else:
review_count = "Review count not available"
description = card.find("div", {"class": "d7449d770c"}).text.strip()
print("Name:", property_name)
print("Location:", property_location)
print("Rating:", star_rating)
print("Review Count:", review_count)
print("Description:", description)
print("-" * 40)
else:
print("Failed to fetch webpage")
And that's it! We've written a complete web scraper to extract hotel listings from Booking.com using Python. The same technique can be applied to scrape any site.
Hope this gives you a solid understanding of how to use requests and BeautifulSoup for web scraping.
While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.
Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.
This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.
With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.