When parsing HTML using BeautifulSoup in Python, the prettify() method is handy for formatting and printing the HTML in a more readable way.
What prettify() Does
The
For example:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())
Instead of printing a long single line of HTML text, it will print something like:
<html>
<head>
<title>
Page Title
</title>
</head>
<body>
<h1>
Main Heading
</h1>
<p>
Lorem ipsum dolor sit amet.
</p>
</body>
</html>
Making the HTML much easier to read!
Specifying Encoder
By default
print(soup.prettify(encoder="latin-1"))
Output to a File
To store the formatted HTML in a file, open a file for writing bytes and pass
with open("formatted.html", "wb") as file:
file.write(soup.prettify(encoder="utf-8"))
This persists the reformatted HTML to disk.
Limitations
One caveat is that
Overall,