eBay is one of the largest online marketplaces with millions of active listings at any given time. In this tutorial, we'll walk through how to scrape and extract key data from eBay listings using R and the rvest package.
Setup
We'll need to install the rvest package:
install.packages("rvest")
And load it:
library(rvest)
We'll also define the starting eBay URL and a user agent string:
url <- "<https://www.ebay.com/sch/i.html?_nkw=baseball>"
user_agent <- "Mozilla/5.0 ..."
Replace the user agent with your browser's user agent string.
Fetch the Listings Page
We can use rvest to make the HTTP request:
page <- html_session(url, useragent = user_agent) %>% read_html()
The user agent is passed in the request. The page HTML is parsed.
Extract Listing Data
We can use CSS selectors to extract info from the elements:
listings <- page %>% html_nodes("div.s-item__info")
for (listing in listings) {
title <- html_node(listing, "div.s-item__title") %>% html_text()
url <- html_node(listing, "a.s-item__link") %>% html_attr("href")
price <- html_node(listing, "span.s-item__price") %>% html_text()
# Extract other fields like seller, shipping, etc
}
We find the listings and extract text/attributes from the nodes.
Print Results
We can print the extracted data:
print(paste("Title:", title))
print(paste("URL:", url))
print(paste("Price:", price))
print(strrep("=", 50)) # Separator between listings
This will output each listing's info.
Full Code
Here is the full code to scrape eBay listings:
library(rvest)
url <- "<https://www.ebay.com/sch/i.html?_nkw=baseball>"
user_agent <- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
page <- html_session(url, useragent = user_agent) %>% read_html()
listings <- page %>% html_nodes("div.s-item__info")
for (listing in listings) {
title <- html_node(listing, "div.s-item__title") %>% html_text()
url <- html_node(listing, "a.s-item__link") %>% html_attr("href")
price <- html_node(listing, "span.s-item__price") %>% html_text()
details <- html_node(listing, "div.s-item__subtitle") %>% html_text()
seller_info <- html_node(listing, "span.s-item__seller-info-text") %>% html_text()
shipping_cost <- html_node(listing, "span.s-item__shipping") %>% html_text()
location <- html_node(listing, "span.s-item__location") %>% html_text()
sold <- html_node(listing, "span.s-item__quantity-sold") %>% html_text()
print(paste("Title:", title))
print(paste("URL:", url))
print(paste("Price:", price))
print(paste("Details:", details))
print(paste("Seller:", seller_info))
print(paste("Shipping:", shipping_cost))
print(paste("Location:", location))
print(paste("Sold:", sold))
print(strrep("=", 50))
}