The urllib library is one of Python's most useful built-in tools for retrieving data from the web. With just a few lines of code, you can leverage urllib to easily scrape web pages, interact with APIs, and more.
Fetching Web Pages
The primary purpose of
import urllib.request
with urllib.request.urlopen('http://example.com') as response:
html = response.read()
This opens the web page, downloads the response, and stores it in a variable to parse later.
The benefit here is simplicity - no need to manually handle connections, HTTP headers, status codes, and more.
URL Manipulation
from urllib.parse import urlparse
url = 'http://user:[email protected]:8080/path/file.html?query=param#fragment'
parsed = urlparse(url)
You can also build URLs from scratch using
Handling HTTP Requests
While
import urllib.request
req = urllib.request.Request('http://example.com')
req.add_header('User-Agent', 'My Python App')
with urllib.request.urlopen(req) as response:
print(response.read())
This allows adding headers, HTTP method overrides, URL parameters, and more.
In summary,