BeautifulSoup is a useful library for extracting data from HTML tables in Python. With a few simple lines of code, you can parse an HTML table and convert it into a pandas DataFrame for further analysis.
Parsing the Table
To parse an HTML table with BeautifulSoup, first load the HTML document and find the You can then loop through each This gives you a list of lists containing each cell's text. To convert to a pandas DataFrame, pass the list of rows along with column names: The DataFrame will contain the nicely structured table data. You can also extract other attributes like href links from table cells: To extract a table from a BeautifulSoup string, parse it first: Then continue parsing as normal. In summary, BeautifulSoup makes extracting data from HTML tables very straightforward. Pairing it with pandas gives you powerful data analysis capabilities over scraped tabular data.
Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com" <!doctype html> Enter your email below to claim your free API key: tag.
row and cell, appending the data to lists: from bs4 import BeautifulSoup
import requests
url = '<https://example.com/table>'
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')
table = soup.find('table')
rows = []
for row in table.find_all('tr'):
rows.append([val.text for val in row.find_all('td')])
Converting to DataFrame
import pandas as pd
df = pd.DataFrame(rows, columns=['Name', 'Age', 'Job'])
print(df)
Extracting Attributes
rows = []
for row in table.find_all('tr'):
cells = [cell.find('a').get('href') for cell in row.find_all('td')]
rows.append(cells)
Converting Strings
html = "<table>...</table>"
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table')
Browse by tags:
Browse by language:
The easiest way to do Web Scraping
Try ProxiesAPI for free
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
...Don't leave just yet!