eBay is one of the largest online marketplaces with millions of active listings at any given time. In this tutorial, we'll walk through how to scrape and extract key data from eBay listings using Scala and the HTTP4S library.
Setup
We'll need the following dependencies added to our build.sbt:
libraryDependencies ++= Seq(
"org.http4s" %% "http4s-blaze-client" % http4sVersion,
"org.jsoup" % "jsoup" % "1.14.3"
)
This will pull in HTTP4S for making requests, and Jsoup for parsing HTML.
We'll also define the starting eBay URL and a header for the user agent:
import org.http4s._
val url = uri"<https://www.ebay.com/sch/i.html?_nkw=baseball>"
val userAgent = headers.User-Agent(headerValue = "Mozilla/5.0...")
Replace the user agent string with your own browser's user agent.
Fetch the Listings Page
We'll use the HTTP4S client to fetch the HTML content:
val client = Client.fromHttpApp()
val req = Request[IO](Method.GET, url).putHeaders(userAgent)
val html = client.expect[String](req).unsafeRunSync()
The user agent header is added to the request. The response body is parsed as a String.
Extract Listing Data
Now we can use Jsoup to parse the HTML and extract the data:
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
val doc: Document = Jsoup.parse(html)
val listings = doc.select("div.s-item__info")
for (listing <- listings) {
val title = listing.select("div.s-item__title").text()
val url = listing.select("a.s-item__link").attr("href")
val price = listing.select("span.s-item__price").text()
// Extract other fields like seller, shipping, etc
}
We select elements by CSS class and extract the text or attributes.
Print Results
We can print the extracted info:
print(s"Title: $title")
print(s"URL: $url")
print(s"Price: $price")
print("="*50) // Separator between listings
This will output each listing's data.
Full Code
Here is the full code to scrape eBay listings:
import org.http4s._
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
val url = uri"<https://www.ebay.com/sch/i.html?_nkw=baseball>"
val userAgent = headers.User-Agent(headerValue = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")
val client = Client.fromHttpApp()
val req = Request[IO](Method.GET, url).putHeaders(userAgent)
val html = client.expect[String](req).unsafeRunSync()
val doc: Document = Jsoup.parse(html)
val listings = doc.select("div.s-item__info")
for (listing <- listings) {
val title = listing.select("div.s-item__title").text()
val url = listing.select("a.s-item__link").attr("href")
val price = listing.select("span.s-item__price").text()
val details = listing.select("div.s-item__subtitle").text()
val sellerInfo = listing.select("span.s-item__seller-info-text").text()
val shippingCost = listing.select("span.s-item__shipping").text()
val location = listing.select("span.s-item__location").text()
val sold = listing.select("span.s-item__quantity-sold").text()
println(s"Title: $title")
println(s"URL: $url")
println(s"Price: $price")
println(s"Details: $details")
println(s"Seller: $sellerInfo")
println(s"Shipping: $shippingCost")
println(s"Location: $location")
println(s"Sold: $sold")
println("="*50)
}