Have you ever tried to scrape or test a website with Selenium or Requests in Python, only to be greeted by pesky "Access Denied" errors? These forbidden access messages can be frustrating, but with the right approach you can often bypass them.
Common Causes of Access Errors
There are a few main reasons why you might encounter access errors:
Tips for Bypassing Access Errors
Here are some tips for handling "Access Denied" errors:
Use a Proxy or VPN
One easy fix is to route your traffic through a proxy service or VPN. This gives your code a different IP address that may not be blocked:
proxied_session = requests.Session()
proxied_session.proxies = {"http": "http://192.168.1.1:3128"}
response = proxied_session.get("https://example.com")
With Selenium, you can configure the browser proxy settings to route traffic through a proxy.
Mimic a Real Browser
For headless Selenium or Requests, mimic a real web browser by adding browser
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."}
requests.get("https://example.com", headers=headers)
This makes your code appear to the site as a regular browser.
Use a "Real" Browser with Selenium
Consider using a normal Selenium-controlled browser like Chrome or Firefox instead of headless mode. Many sites have better bot protection against headless browsers.
A real GUI browser can more easily bypass protections. Just be careful about scaling this approach up.
Slow Down Requests
Sometimes simple rate limiting does the trick. Sites may block you if they detect unusually fast automated access:
import time
for page in range(10):
response = requests.get("https://example.com/page"+str(page))
time.sleep(5) # Pause 5 seconds
This crawling pattern appears more human.
Cache and Reuse Cookies
For sites that track your session, reuse cookies from a real browser session instead of allowing headless Selenium or Requests to accept new cookies each time:
# After logging into site manually...
cookies = selenium_driver.get_cookies()
for cookie in cookies:
requests.get("https://example.com", cookies={cookie['name']: cookie['value']})
This lets your code reuse an authenticated session.
When All Else Fails...
Sometimes elaborate bot protection will still block everything. If you absolutely must access the site and the methods above don't work, consider automating an actual browser instead of headless mode.
This uses more resources, but tools like Selenium allow controlling a real Chrome browser to bypass protections websites apply specifically against headless browsers and bots.
Key Takeaways
Here are some key tips to remember:
With the right approach, you can often find a way to bypass pesky access errors while web scraping and testing sites. The methods above should give you some options to try next time you get denied.