In this post, we'll walk through code that scrapes real estate listing data from Realtor.com using a Java library called Jsoup.
Why Scrape Realtor.com?
Realtor.com contains rich listing information for properties across the United States. By scraping this data, we can analyze real estate trends programmatically or build applications using large-scale housing data.
This is the listings page we are talking about…
Importing Jsoup
We first import the Jsoup Java library that enables web scraping:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
Jsoup handles connecting to web pages, parsing HTML, finding DOM elements, extracting data - everything needed for web scraping.
We also import Java IO capabilities:
import java.io.IOException;
With the imports set up, let's look at the main logic.
Connecting to the Webpage
We define the Realtor URL we want to scrape:
String url = "<https://www.realtor.com/realestateandhomes-search/San-Francisco_CA>";
This URL searches for San Francisco listings on Realtor.com.
Next, we use Jsoup to send a GET request to this URL:
Document doc = Jsoup.connect(url)
.userAgent("Mozilla/5.0...")
.get();
The returned
Extracting Listing Data
Inspecting the element
When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…
With the base DOM parsed, we can now query elements and extract information.
Realtor.com loads listings dynamically via JavaScript. To locate the raw listing blocks, we use this selector:
Elements listingBlocks = doc.select("div.BasePropertyCard_propertyCardWrap__J0xUj");
This fetches all We loop through each listing: And inside this loop, we extract various fields using additional selectors: Let's analyze the beds selector: The other selectors work the same way to extract additional fields. Finally, we print the output: The full listing data is now programmatically extracted from Realtor.com using Jsoup and some knowledge of CSS selectors. The possibilities are endless for how these real estate datasets could be utilized! Here is the complete runnable code to scrape Realtor listings with Jsoup in Java:
Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com" <!doctype html>for (Element listingBlock : listingBlocks) {
// Extract listing data...
}
// Broker information
Element brokerInfo = listingBlock.selectFirst("div.BrokerTitle_brokerTitle__ZkbBW");
// Status
String status = listingBlock.selectFirst("div.message").text().trim();
// Price
String price = listingBlock.selectFirst("div.card-price").text().trim();
// Beds
String beds = listingBlock.select("li[data-testid=property-meta-beds]").text().trim();
// Baths
String baths = listingBlock.select("li[data-testid=property-meta-baths]").text().trim();
// Address
String address = listingBlock.selectFirst("div.card-address").text().trim();
listingBlock.select("li[data-testid=property-meta-beds]")
System.out.println("Beds: " + beds);
System.out.println("Price: " + price);
// etc...
Full Code
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class RealtorScraper {
public static void main(String[] args) {
// Define the URL of the Realtor.com search page
String url = "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA";
try {
// Send a GET request to the URL
Document doc = Jsoup.connect(url)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")
.get();
// Find all the listing blocks using the provided class name
Elements listingBlocks = doc.select("div.BasePropertyCard_propertyCardWrap__J0xUj");
// Loop through each listing block and extract information
for (Element listingBlock : listingBlocks) {
// Extract the broker information
Element brokerInfo = listingBlock.selectFirst("div.BrokerTitle_brokerTitle__ZkbBW");
String brokerName = brokerInfo.selectFirst("span.BrokerTitle_titleText__20u1P").text().trim();
// Extract the status (e.g., For Sale)
String status = listingBlock.selectFirst("div.message").text().trim();
// Extract the price
String price = listingBlock.selectFirst("div.card-price").text().trim();
// Extract other details like beds, baths, sqft, and lot size
String beds = listingBlock.select("li[data-testid=property-meta-beds]").text().trim();
String baths = listingBlock.select("li[data-testid=property-meta-baths]").text().trim();
String sqft = listingBlock.select("li[data-testid=property-meta-sqft]").text().trim();
String lotSize = listingBlock.select("li[data-testid=property-meta-lot-size]").text().trim();
// Extract the address
String address = listingBlock.selectFirst("div.card-address").text().trim();
// Print the extracted information
System.out.println("Broker: " + brokerName);
System.out.println("Status: " + status);
System.out.println("Price: " + price);
System.out.println("Beds: " + beds);
System.out.println("Baths: " + baths);
System.out.println("Sqft: " + sqft);
System.out.println("Lot Size: " + lotSize);
System.out.println("Address: " + address);
System.out.println("-".repeat(50)); // Separating listings
}
} catch (IOException e) {
System.err.println("Failed to retrieve the page.");
e.printStackTrace();
}
}
}
Browse by tags:
Browse by language:
The easiest way to do Web Scraping
Try ProxiesAPI for free
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
...