Using Proxies in reqwest with Rust in 2024

Jan 9, 2024 ยท 4 min read

Proxies are intermediaries that forward along your requests. This allows hiding your original IP address from the destination site.

Reqwest has first-class proxy support for routing requests through proxies:

// Proxy URL
let proxy_url = "<http://user:[email protected]:8080>";

// Create an HTTP proxy
let proxy = reqwest::Proxy::http(proxy_url)?;

// Enable it on the client
let client = reqwest::Client::builder()
    .proxy(proxy)
    .build()?;

We create a proxy pointing to our proxy server, then tell the Reqwest client to use it. Now requests funnel through the proxy and pick up its IP address.

Tip: Prefer HTTPS proxies since traffic is encrypted end-to-end. HTTP proxies allow the proxy server to read your data.

Proxy Authentication

Some proxies require authentication to allow access:

let proxy = reqwest::Proxy::http(proxy_url)?
                .basic_auth("myuser", "password123");

We chain .basic_auth() to add credentials. There's also .custom_http_auth() for non-Basic authentication.

Tip: Watch out for leaking credentials in your code! Use environment variables or secure secret storage.

Custom Proxy Rules

We may want fine-grained control over which requests use the proxy versus going directly.

Reqwest allows custom proxy selection logic:

let proxy = reqwest::Proxy::custom(|url| {
    if url.host_str() == Some("api.example.com") {
        Some(proxy_url.clone())
    } else {
        None
    }
})

Here we funnel only api.example.com traffic through the proxy. All other requests go directly.

Excluding Sites from the Proxy

Sometimes you explicitly want to bypass proxies for certain domains even when a general proxy is configured:

let proxy = reqwest::Proxy::http(proxy_url)?
                .no_proxy(
                    reqwest::NoProxy::from_str("*.google.com, *.github.io")?
                );

We passed a comma-separated list of domains to avoid proxying. The * wildcard matches subdomains too.

Tip: Mind the order of proxy definitions as they're checked sequentially. Put specific rules before general ones.

Advanced Proxy Usage

Beyond basic proxying, there's a few other useful techniques for evading blocks.

Capturing Traffic

Debugging scraping traffic can be challenging. We can proxy through tools like Burp Suite to inspect requests and responses.

Burp provides a proxy server and TLS certificate for decrypting HTTPS traffic. We feed the certificate to Reqwest to allow proxying encrypted connections:

// Load certificate from file
let buf = reqwest::get("burp.crt").await?.bytes().await?;
let cert = reqwest::Certificate::from_der(&buf)?;

// Enable proxy and cert
let client = reqwest::Client::builder()
    .proxy(proxy)
    .add_root_certificate(cert)
    .build();

Now traffic routes through Burp for monitoring and modification.

Pro Tip: Burp has great built-in tools for manipulating requests and replaying responses. Plus formatting tools to visualize session data.

Asynchronous Proxies

By default Reqwest uses threadpool-based DNS resolution and blocking proxies. We can enable full asynchronous operation with Tokio:

#[tokio::main]
async fn main() {
    let proxy = reqwest::Proxy::all(proxy_url)?
        .unwrap_or_else(|| panic!("invalid proxy URL"));

    let client = reqwest::Client::builder()
        .proxy(proxy)
        .build();
}

Now DNS and proxy traffic uses asynchronous I/O, avoiding blocking.

Tip: The .unwrap_or_else() handles invalid proxy URLs by panicking instead of the default silent error. Useful for fail fast behavior.

Recap and Key Takeaways

We've covered a lot of ground around proxying web scraping traffic with Reqwest:

  • Obfuscation - Proxies hide your scraper IP address
  • Customization - Fine-tune proxy rules and excluded sites
  • Inspection - Capture and monitor traffic with tools like Burp
  • Non-blocking - Enable fully asynchronous proxies
  • With Reqwest proxies you get flexibility in routing and transforming requests. Combined with Rust's speed and safety, it's a compelling stack for robust web scraping.

    Yet it's easy to underestimate the endless tricks sites use to fingerprint and block scrapers. The challenges of managing proxies to stay undetected are also non-trivial.

    This is where using a dedicated proxy service can help.

    Scraping-as-a-Service with Proxies API

    Managing scrapers, proxies, and dealing with ever-evolving bot mitigation is no easy task. The real world is messy.

    Proxies API provides proxy functionality as a fully managed API service instead.

    The key capabilities:

  • Millions of fast, reliable residential proxies
  • Automatic rotating IP addresses
  • Multi-threaded for concurrent requests
  • Unlimited bandwidth
  • Global locations to access any site
  • Instead of handling proxies yourself, Proxies API gives a simple API for proxying requests:

    curl "<http://api.proxiesapi.com/?render=true&url=https://target.com>"
    

    This renders JavaScript, rotates IPs automatically, and returns parsed HTML.

    We offer a free 1,000 request trial to test it out. So if you're looking to scrape at scale without infrastructure headaches, consider Proxies API as the all-in-one anti-blocking solution.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: