In this article, we will learn how to scrape property listings from Booking.com using Go. We will use the net/http and goquery libraries to fetch the HTML content and then extract key information like property name, location, ratings, etc.
Prerequisites
To follow along, you will need:
Importing Packages
At the top of your Go file, import the required packages:
import (
"net/http"
"github.com/PuerkitoBio/goquery"
)
Defining the Target URL
—
Let's define the URL we want to scrape:
url := "<https://www.booking.com/searchresults.html?ss=New+York&>..."
We won't paste the full URL here.
Setting a User Agent
We need to set a valid user agent header:
client := &http.Client{}
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)...")
This will make the request appear to come from a real browser.
Fetching the HTML Page
We can use the http Client to get the page HTML:
resp, _ := client.Do(req)
defer resp.Body.Close()
if resp.StatusCode == 200 {
// Parse HTML
}
We make sure the request succeeded before parsing.
Parsing the HTML
To parse the HTML, we use goquery's
doc, _ := goquery.NewDocumentFromReader(resp.Body)
This loads the HTML into a goquery Document.
Extracting Property Cards
The property cards have a
doc.Find("div[data-testid='property-card']").Each(func(i int, card *goquery.Selection) {
// Extract data from card
})
This finds all matching To get the title, we search for the We grab the text contents. Similarly, the address is under a The pattern is the same for other fields. The star rating Here we get the The review count text is inside a The description is in a We can print out the extracted data: The full code for scraping each property card is available on GitHub. And that covers scraping Booking.com property listings in Go! Let me know if you have any other questions. Here is the complete Go code: While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help. Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself. This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping. With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.
Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com" <!doctype html>Extracting Title
title := card.Find("div[data-testid='title']").Text()
Extracting Location
location := card.Find("span[data-testid='address']").Text()
Extracting Rating
rating := card.Find("div.e4755bbd60").Attr("aria-label")
Extracting Review Count
reviewCount := card.Find("div.abf093bdfe").Text()
Extracting Description
description := card.Find("div.d7449d770c").Text()
Printing the Data
fmt.Println("Name:", title)
fmt.Println("Location:", location)
fmt.Println("Rating:", rating)
// etc...
Full Code
package main
import (
"fmt"
"net/http"
"github.com/PuerkitoBio/goquery"
)
func main() {
url := "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2"
client := &http.Client{}
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)...")
resp, _ := client.Do(req)
defer resp.Body.Close()
if resp.StatusCode == 200 {
doc, _ := goquery.NewDocumentFromReader(resp.Body)
doc.Find("div[data-testid='property-card']").Each(func(i int, card *goquery.Selection) {
title := card.Find("div[data-testid='title']").Text()
location := card.Find("span[data-testid='address']").Text()
rating := card.Find("div.e4755bbd60").Attr("aria-label")
reviewCount := card.Find("div.abf093bdfe").Text()
description := card.Find("div.d7449d770c").Text()
fmt.Println("Name:", title)
fmt.Println("Location:", location)
fmt.Println("Rating:", rating)
fmt.Println("Review Count:", reviewCount)
fmt.Println("Description:", description)
})
}
}
Browse by language:
The easiest way to do Web Scraping
Try ProxiesAPI for free
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
...