eBay is one of the largest online marketplaces with millions of active listings at any given time. In this tutorial, we'll walk through how to scrape and extract key data from eBay listings using Elixir and the HTTPoison library.
Setup
We'll need to add HTTPoison to our mix.exs dependencies:
def deps do
[
{:httpoison, "~> 1.8"}
]
end
And import it in our code:
import HTTPoison
We'll also define the eBay URL and a header for the user agent:
url = "<https://www.ebay.com/sch/i.html?_nkw=baseball>"
user_agent = {"User-Agent", "Mozilla/5.0..."}
Replace the user agent string with your browser's user agent.
Fetch the Listings Page
We can use HTTPoison to make the GET request:
resp = HTTPoison.get!(url, [], hackney: [headers: user_agent])
html = resp.body
The user agent header is passed in the options.
Extract Listing Data
To parse the HTML, we can use Floki:
{:ok, document} = Floki.parse_document(html)
listing_nodes = Floki.find(document, "div.s-item__info")
for node <- listing_nodes do
title = Floki.find(node, "div.s-item__title") |> Floki.text()
url = Floki.find(node, "a.s-item__link") |> Floki.attribute("href")
price = Floki.find(node, "span.s-item__price") |> Floki.text()
# Get other fields like seller, shipping, etc
end
We find the listing divs and extract the text/attributes from tags.
Print Results
We can print the extracted values:
IO.puts("Title: #{title}")
IO.puts("URL: #{url}")
IO.puts("Price: #{price}")
IO.puts(String.duplicate("=", 50)) # Separator
This outputs each listing's data.
Full Code
Here is the full code to scrape eBay listings:
import HTTPoison
url = "<https://www.ebay.com/sch/i.html?_nkw=baseball>"
user_agent = {"User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"}
resp = HTTPoison.get!(url, [], hackney: [headers: user_agent])
html = resp.body
{:ok, document} = Floki.parse_document(html)
listing_nodes = Floki.find(document, "div.s-item__info")
for node <- listing_nodes do
title = Floki.find(node, "div.s-item__title") |> Floki.text()
url = Floki.find(node, "a.s-item__link") |> Floki.attribute("href")
price = Floki.find(node, "span.s-item__price") |> Floki.text()
details = Floki.find(node, "div.s-item__subtitle") |> Floki.text()
seller_info = Floki.find(node, "span.s-item__seller-info-text") |> Floki.text()
shipping_cost = Floki.find(node, "span.s-item__shipping") |> Floki.text()
location = Floki.find(node, "span.s-item__location") |> Floki.text()
sold = Floki.find(node, "span.s-item__quantity-sold") |> Floki.text()
IO.puts("Title: #{title}")
IO.puts("URL: #{url}")
IO.puts("Price: #{price}")
IO.puts("Details: #{details}")
IO.puts("Seller: #{seller_info}")
IO.puts("Shipping: #{shipping_cost}")
IO.puts("Location: #{location}")
IO.puts("Sold: #{sold}")
IO.puts(String.duplicate("=", 50))
end