What is the alternative to BeautifulSoup in Python?

BeautifulSoup is a popular Python library used for parsing HTML and extracting data from websites. However, there are several alternatives if you don't want to use BeautifulSoup.

Why Consider Alternatives?

There are a few reasons why you may want to use something other than BeautifulSoup:

Don't want to install another dependency

Need better performance

Want to parse invalid/malformed HTML

Need more control over the parsing process

Built-in XML Parsers

Python's standard library comes with XML parsing modules like xml.etree.ElementTree and xml.dom.minidom.

These allow you to parse HTML using built-in Python code rather than an external library. The syntax is a bit more verbose than BeautifulSoup but they get the job done.

import xml.etree.ElementTree as ET

tree = ET.parse(html_file)
root = tree.getroot()

for p in root.iter('p'):
    print(p.text)

The built-in parsers do not handle malformed HTML as well as BeautifulSoup though.

HTML Parser

Python 3.4+ includes an html.parser module that parses HTML in a similar way to BeautifulSoup. It produces a parse tree that you can traverse to extract data.

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Encountered a start tag:", tag)

    def handle_endtag(self, tag):
        print("Encountered an end tag :", tag)

parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head></html>')

While not as full-featured as BeautifulSoup, html.parser gets the job done for basic use cases.

Regular Expressions

For simple HTML, regular expressions may be all you need. Just be careful since regex can get messy with complex HTML.

In the end, BeautifulSoup is still the most popular and full-featured option. But these libraries can make capable alternatives in a pinch.

What is the alternative to BeautifulSoup in Python?

Why Consider Alternatives?

Built-in XML Parsers

HTML Parser

Regular Expressions

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

What is the alternative to BeautifulSoup in Python?

Why Consider Alternatives?

Built-in XML Parsers

HTML Parser

Regular Expressions

The easiest way to do Web Scraping

Don't leave just yet!