The urllib module in Python provides useful functionality for retrieving data from URLs. This allows you to easily download and read web pages into your Python programs.
Fetching Web Pages
To fetch a web page, you first need to import
import urllib.request
Then you can use
with urllib.request.urlopen('http://example.com') as response:
html = response.read()
This reads the raw HTML content into the
Decoding and Parsing
Since the content is bytes, you'll typically want to decode it to string data:
html = html.decode()
Now you can parse or process the HTML however you want, such as extracting data or searching the content.
The
from html.parser import HTMLParser
parser = HTMLParser()
parser.feed(html)
Handling Errors
try:
with urllib.request.urlopen('http://badurl') as response:
# Code here
except Exception as e:
print(f"Failed with error: {e}")
This prints a nice error message instead of crashing your program.
Practical Example: Checking Broken Links
A handy use case is writing a web crawler that checks for broken links by trying to open URLs and catching errors. This can help find dead pages on your site.
The