Python provides several modules for programmatically downloading files and web content. Two commonly used modules are urllib and wget. While they share some overlapping functionality, each has unique capabilities that make them useful in different scenarios.
urllib - Downloading in Pure Python
The urllib module is part of Python's standard library, making it widely available without any extra installations. urllib provides functions for fetching URLs, handling redirects, parsing response data, encoding/decoding URLs, and more.
A basic example of using urllib to download a file:
This downloads the file from the URL and saves it locally as file.zip.
Some key advantages of urllib:
No extra dependencies - included with Python by default
More control from within Python code
Supports FTP, file, and HTTP/HTTPS urls
Handling redirects, proxies, cookies, compression
Powerful URL encoding/decoding functions
Extensible with custom URL opener objects
The main downside is that the API involves dealing with lower level details instead of just a simple download interface. But overall, urllib excels when you need downloading capabilities directly from Python.
wget - Feature-rich Command Line Tool
wget is a popular command line program available on Linux, macOS, and Windows (via wget for Windows). wget can download web content and files but also has advanced capabilities like:
Resume interrupted downloads
Recursively download page contents
Download galleries/sections
Spider through links to capture entire sites
Restrict downloads based on rules
Authentication, cookies, and sessions
Adaptive bandwidth throttling
Output logging and reporting
This power and flexibility has made wget a go-to tool for web scraping and archiving websites.
To simply download a file with wget:
wget http://example.com/file.zip
Some advantages of using wget:
Robust feature set for complex jobs
Handles unstable connections reliably
Scripting capabilities to automate tasks
Available without installing anything in Python
The main downside is it's an external command line tool, so you need to execute wget and then parse its output in Python code.
Choosing the Right Tool
So which one should you use? Here are some guidelines:
urllib - Need downloading fully inside Python code. Less complexity is better.
wget - Require advanced features like recursive crawling. Want battle-tested tool.
Both - Use urllib for simple one-off downloads. wget for heavy lifting.
The great news is you can choose either tool based on your specific requirements. And it's totally fine to use both in conjunction when building an application with lots of data scraping or processing.
Practical Examples
Let's look at some practical code snippets for common use cases.
Download a file only if newer using urllib:
import urllib.request
import time
import os.path
url = 'http://example.com/data.csv'
file = 'data.csv'
if not os.path.exists(file) or (os.path.getmtime(file) < time.time() - 86400):
urllib.request.urlretrieve(url, file)
This checks if the file is older than one day before downloading the latest version.
Resume a failed download with wget:
wget -c http://example.com/large_file.zip
The -c flag continues the download instead of starting from scratch.
So take advantage of both urllib and wget for your Python downloading tasks. Choose the right tool or combine them as needed to create robust solutions.
Browse by tags:
Browse by language:
The easiest way to do Web Scraping
Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you