As an experienced web scraper, you've likely encountered your fair share of redirects. While scraping a site, you request a URL only to be redirected to a different page.
At first, these redirects seemed like a nuisance. Your script would crash or get stuck in an endless loop, unable to handle the unexpected redirection.
But over time, you learned to master redirects with Python's powerful Requests module. You even picked up a few insider tricks along the way.
In this guide, I'll share everything I've learned for foolproof redirect handling. We'll start from the basics then level up to advanced techniques.
Ready to become a redirect ninja? Let's dive in.
Follow that Redirect!
The first step is understanding how to simply follow a redirect.
By default, Requests will not follow redirects from the initial URL. So if you request http://example.com and get redirected to https://www.example.com, your response will still contain data from the original http://example.com.
To follow redirects, we need to explicitly enable them with the
import requests
response = requests.get('<http://example.com>', allow_redirects=True)
print(response.url)
# Prints out <https://www.example.com>
Setting
This works for POST, PUT, and other request types too. Just add the same
Smarter Sessions
But what if our script needs to make many requests? Opening and closing connections for every call is inefficient.
This is where Sessions come in handy:
session = requests.Session()
session.get('<http://example.com>', allow_redirects=True)
# ...make more requests...
Sessions let us persist settings like cookies and header values across requests.
We can also configure them for smarter redirect handling:
session = requests.Session()
session.config['strict_redirects'] = False
response = session.get('<http://example.com>')
With
Custom Redirect Handlers
For ultimate control, we can create custom redirect handlers with the
import urllib3
http = urllib3.PoolManager()
redirectHandler = urllib3.HTTPRedirectHandler()
http.add_redirect_handler(redirectHandler)
Now we can subclass
class MyRedirectHandler(urllib3.HTTPRedirectHandler):
def redirect_request(self, req, fp, code, msg, hdrs, newurl):
# Custom logic here
return super().redirect_request(req, fp, code, msg, hdrs, newurl)
myHandler = MyRedirectHandler()
http.add_redirect_handler(myHandler)
For example, we could change the request method on 301 redirects. The possibilities are endless!
Inspecting Redirects
Once you've enabled redirect handling, you'll likely want to inspect what redirections occurred under the hood.
The
response = requests.get('<http://example.com>', allow_redirects=True)
print(response.history)
# [<Response [301]>]
We can also print out each previous URL and status code like so:
for resp in response.history:
print(f"{resp.status_code} - {resp.url}")
Finally,
Beware the Infinite Loop
One infamous redirect gotcha is the infinite redirect loop, where a URL gets caught bouncing between pages.
To avoid crashes, we can set the
response = requests.get('<http://example.com>', max_redirects=10)
If exceeded, Requests will raise a
For debugging, enabling the Requests logger can help identify any problematic redirect chains.
Redirect Considerations
There are a few other redirect-related factors to keep in mind:
Mastering these nuances takes practice, but pays dividends for reliable scraping.
Alternative: Urllib
Before you get too comfortable with Requests, it's worth noting the built-in urllib modules can also handle redirects.
The
However, Requests tends to offer a simpler and more Pythonic interface. Unless you need ultra-fine control, Requests is likely the better choice.
Common Redirect Questions
Here are some common redirect-related questions for reference:
Q: How do I stop/prevent redirects in Requests?
A: Set
Q: Why am I getting "Too many redirects" errors?
A: Add a
Q: Should I use 307 or 308 code for temporary redirects?
A: 307 is more widely supported, 308 is semantically a bit clearer.
Q: How do I redirect POST data or cookies in Flask/Django?
A: Use
Q: How do I inspect previous URLs from redirects?
A: Check
Key Takeaways
To recap, the key skills for redirect mastery include:
Master these techniques, and no redirect will faze you again!
For next steps, practice redirect scenarios to get hands-on experience. And feel free to reach out with any other redirect questions.
Happy redirect ninja training!
FAQ
Q: How do I permanently redirect in Python?
A: Return a 301 Moved Permanently status code like
Q: Why am I getting SSL errors after redirect?
A: Make sure SSL verification is configured correctly. Or try
Q: How can I redirect from HTTP to HTTPS in Flask?
A: Detect the scheme and redirect if needed:
from urllib.parse import urlparse
@app.before_request
def before_request():
if request.url.startswith('http://'):
url = request.url.replace('http://', 'https://')
return redirect(url, code=302)
Q: How do I redirect back to a URL with query parameters?
A: Parse the URL with
from urllib.parse import urlparse, urlencode
@app.route('/redirect')
def redirect_back():
url = urlparse(request.url)
query = urlencode(dict(url.query))
url = f"{url.path}?{query}"
return redirect(url)
This preserves the original query parameters.
Q: Can I redirect from an API view in Django?
A: Yes, use
from django.http import HttpResponseRedirect
def my_view(request):
url = '/new/url/'
return HttpResponseRedirect(redirect_to=url)
Just return the response object.