The urllib module in Python provides useful tools for retrieving and parsing content from URLs. It comes built-in with Python, making it easy to access in your code.
Fetching Text Content
To fetch text content from a URL, you can use
import urllib.request
with urllib.request.urlopen('http://example.com') as response:
html = response.read()
This opens the URL, downloads the response content as bytes, and stores it in the
You can also read line by line by treating the response as a file object:
with urllib.request.urlopen('http://example.com') as response:
for line in response:
print(line)
Parsing Text
Once you have retrieved the text content, you may want to parse it to extract relevant information.
For example, to parse HTML you can use a parser like Beautiful Soup. To parse JSON, you can use the built-in
Here's an example parsing JSON from a URL:
import json
import urllib.request
with urllib.request.urlopen("http://api.example.com") as url:
data = json.loads(url.read().decode())
print(data["key"])
This fetches the JSON data, decodes the bytes to text, parses it to a Python dict with
Handling Errors
Make sure to wrap calls to
try:
with urllib.request.urlopen('http://example.com') as response:
# Code here
except urllib.error.URLError as e:
print(f"URL Error: {e.reason}")
This way you can catch common issues like connection issues, HTTP errors, redirect loops, etc.
Overall,