Unblocking Python Requests Blocked by Cloudflare - A Guide for Developers

Apr 2, 2024 ยท 4 min read

As a developer, you may encounter the frustrating situation where your Python requests get blocked by Cloudflare protections on certain sites. This results in errors like 403 Forbidden or 503 Service Unavailable.

Cloudflare provides DDoS protection and security for sites by filtering out suspicious traffic. Sometimes legitimate requests get caught as false positives. The good news is there are ways to unblock your Python requests.

Why Requests Get Blocked

Cloudflare maintains threat intelligence on IP addresses. They can block requests from IPs with histories of attacks, spam or scraping activity. Some other reasons your requests may get blocked:

  • Rate Limiting - Sending too many frequent requests that appear like a DDoS attack
  • Bot Protection - Cloudflare bot management flags non-browser requests like Python as bots
  • IP Reputation - Shared pools of residential proxy IPs often have bad reputations
  • Confirm It's Cloudflare Blocking the Requests

    Before troubleshooting, we need to confirm Cloudflare is blocking the requests.

    Check the response headers for a Server: cloudflare header. Also check if error responses reference Cloudflare like "You are being rate limited by Cloudflare" or have a Cloudflare branded captcha.

    Solutions to Unblock Python Requests

    Here are some methods to solve Cloudflare blocks with your Python requests:

    1. Use a Proxy or VPN

    Proxies and VPNs allow you to route your requests through a different IP address. Residential proxies with good reputation can effectively bypass Cloudflare protections.

    Example:

    import requests
    
    proxies = {
      'http': 'http://192.168.0.1:8080',
      'https': 'http://192.168.0.1:8080',
    }
    
    response = requests.get('https://example.com', proxies=proxies)

    2. Rotate User Agents

    Changing the user agent to mimic a real browser helps avoid bot detections. Maintain a pool of random desktop and mobile user agents to rotate with each request.

    Example:

    import requests
    import random
    
    user_agents = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 
                   'Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148']
    
    headers = {'User-Agent': random.choice(user_agents)} 
    
    response = requests.get('https://example.com', headers=headers)

    3. Add Cloudflare Bypass Headers

    You can mimic a browser's headers to appear less bot-like. This involves adding headers like:

    User-Agent
    Referer 
    Accept
    Accept-Language
    Accept-Encoding
    Connection

    Example:

    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", 
        "Accept-Encoding": "gzip, deflate, sdch, br",  
        "Accept-Language": "en-US,en;q=0.8",
        "Referer": "https://example.com",
    }
     
    response = requests.get('https://example.com', headers=headers)

    4. Slow Down Requests

    If you are sending a high volume of requests in a short span, Cloudflare may rate limit your requests. Adding delays between requests prevents getting flagged for rate limits.

    Example:

    import requests
    import time
    
    delay = 1 # seconds
    
    response = requests.get('https://example.com')
    time.sleep(delay) 
    
    response = requests.get('https://example.com/page2') 
    time.sleep(delay)
    
    response = requests.get('https://example.com/page3')

    5. Retry Failed Requests

    Implementing retries allows your program to wait and re-attempt failed requests that were likely blocked. This gives time for Cloudflare to reset and potentially unblock your IP.

    Example:

    from requests.adapters import HTTPAdapter
    from requests.packages.urllib3.util.retry import Retry
    
    retry_strategy = Retry(
        total=5,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
        method_whitelist=["HEAD", "GET", "OPTIONS"]
    )
    
    adapter = HTTPAdapter(max_retries=retry_strategy)
    http = requests.Session()
    http.mount("https://", adapter)
    http.mount("http://", adapter)
    
    response = http.get("https://example.com")

    The above will perform up to 5 retry attempts on error status codes like 429 Too Many Requests and 5XX errors.

    Final Tips

  • Profile your traffic patterns to avoid sudden spikes that appear like an attack
  • For heavy usage, distribute load across multiple proxies and IPs
  • Use residential proxies and proxy rotation for better IP reputations
  • Mimic and randomize browser headers to avoid bot detections
  • Getting blocked can be frustrating but following these guidelines will help unblock and stabilize your Python requests when dealing with Cloudflare protections.

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: