Proxies are intermediaries that forward along your requests. This allows hiding your original IP address from the destination site.
Reqwest has first-class proxy support for routing requests through proxies:
// Proxy URL
let proxy_url = "<http://user:[email protected]:8080>";
// Create an HTTP proxy
let proxy = reqwest::Proxy::http(proxy_url)?;
// Enable it on the client
let client = reqwest::Client::builder()
.proxy(proxy)
.build()?;
We create a proxy pointing to our proxy server, then tell the Reqwest client to use it. Now requests funnel through the proxy and pick up its IP address.
Tip: Prefer HTTPS proxies since traffic is encrypted end-to-end. HTTP proxies allow the proxy server to read your data.
Proxy Authentication
Some proxies require authentication to allow access:
let proxy = reqwest::Proxy::http(proxy_url)?
.basic_auth("myuser", "password123");
We chain
Tip: Watch out for leaking credentials in your code! Use environment variables or secure secret storage.
Custom Proxy Rules
We may want fine-grained control over which requests use the proxy versus going directly.
Reqwest allows custom proxy selection logic:
let proxy = reqwest::Proxy::custom(|url| {
if url.host_str() == Some("api.example.com") {
Some(proxy_url.clone())
} else {
None
}
})
Here we funnel only
Excluding Sites from the Proxy
Sometimes you explicitly want to bypass proxies for certain domains even when a general proxy is configured:
let proxy = reqwest::Proxy::http(proxy_url)?
.no_proxy(
reqwest::NoProxy::from_str("*.google.com, *.github.io")?
);
We passed a comma-separated list of domains to avoid proxying. The
Tip: Mind the order of proxy definitions as they're checked sequentially. Put specific rules before general ones.
Advanced Proxy Usage
Beyond basic proxying, there's a few other useful techniques for evading blocks.
Capturing Traffic
Debugging scraping traffic can be challenging. We can proxy through tools like Burp Suite to inspect requests and responses.
Burp provides a proxy server and TLS certificate for decrypting HTTPS traffic. We feed the certificate to Reqwest to allow proxying encrypted connections:
// Load certificate from file
let buf = reqwest::get("burp.crt").await?.bytes().await?;
let cert = reqwest::Certificate::from_der(&buf)?;
// Enable proxy and cert
let client = reqwest::Client::builder()
.proxy(proxy)
.add_root_certificate(cert)
.build();
Now traffic routes through Burp for monitoring and modification.
Pro Tip: Burp has great built-in tools for manipulating requests and replaying responses. Plus formatting tools to visualize session data.
Asynchronous Proxies
By default Reqwest uses threadpool-based DNS resolution and blocking proxies. We can enable full asynchronous operation with Tokio:
#[tokio::main]
async fn main() {
let proxy = reqwest::Proxy::all(proxy_url)?
.unwrap_or_else(|| panic!("invalid proxy URL"));
let client = reqwest::Client::builder()
.proxy(proxy)
.build();
}
Now DNS and proxy traffic uses asynchronous I/O, avoiding blocking.
Tip: The
Recap and Key Takeaways
We've covered a lot of ground around proxying web scraping traffic with Reqwest:
With Reqwest proxies you get flexibility in routing and transforming requests. Combined with Rust's speed and safety, it's a compelling stack for robust web scraping.
Yet it's easy to underestimate the endless tricks sites use to fingerprint and block scrapers. The challenges of managing proxies to stay undetected are also non-trivial.
This is where using a dedicated proxy service can help.
Scraping-as-a-Service with Proxies API
Managing scrapers, proxies, and dealing with ever-evolving bot mitigation is no easy task. The real world is messy.
Proxies API provides proxy functionality as a fully managed API service instead.
The key capabilities:
Instead of handling proxies yourself, Proxies API gives a simple API for proxying requests:
curl "<http://api.proxiesapi.com/?render=true&url=https://target.com>"
This renders JavaScript, rotates IPs automatically, and returns parsed HTML.
We offer a free 1,000 request trial to test it out. So if you're looking to scrape at scale without infrastructure headaches, consider Proxies API as the all-in-one anti-blocking solution.