Using Proxies in Axios in Node.js for Web Scraping in 2024

Jan 9, 2024 ยท 4 min read

Let's look at how to configure proxies for your Node.js web scraping projects. We'll use the popular Axios library for examples, but the concepts apply to any HTTP client.

First, install Axios:

npm install axios

Then we can make a basic request without any proxy:

const axios = require('axios');

axios.get('<https://api.ipify.org>')
  .then(res => {
    console.log(res.data);
  })

This will print your public IP address.

To add a proxy, we simply pass proxy options to Axios:

const proxy = {
  host: '123.45.6.78',
  port: 8080
};

axios.get('<https://api.ipify.org>', { proxy });

Now the request will be routed through the defined proxy IP and port before reaching api.ipify.org.

We can also handle proxy authentication when required:

const proxy = {
  host: '123.45.6.78',
  port: 8080,
  auth: {
    username: 'proxyUser',
    password: 'proxyPassword123'
  }
};

axios.get('<https://api.ipify.org>', { proxy });

In addition to basic HTTP requests, many proxies also support tunneling HTTPS and WebSockets traffic. Check your proxy service's docs to enable this.

Here are some more handy proxy configurations and techniques in Node.js:

Rotate Proxies - Rotate through a pool of proxies randomly or in sequence to mimic organic users:

const proxies = [
  { /* proxy 1 */ },
  { /* proxy 2 */ },
  // ...
];

// Pick random proxy
const proxy = proxies[Math.floor(Math.random() * proxies.length)];

axios.get('<https://target-website.com>', { proxy });

Set Via Environment - Define proxies via the NODE_HTTP_PROXY env var so they are applied globally without configuring each request:

NODE_HTTP_PROXY=http://user:[email protected]:8080 node app.js

Custom Logic - Write proxy middleware to manipulate requests and responses:

const { createProxyMiddleware } = require('http-proxy-middleware');

app.use('/target-site', createProxyMiddleware({
  target: '<https://target-site.com>',
  changeOrigin: true,
  onProxyReq: (proxyReq) => {
    // Add custom headers
    proxyReq.setHeader('User-Agent', generateRandomUA());
  }
}));

Custom Proxy Logic & Events

In addition to proxy configurations, we can also write custom proxy middleware and attach event listeners for more control.

For example, this middleware modifies the proxy request before it is sent:

const proxy = createProxyServer();

proxy.on('proxyReq', (proxyReq, req) => {
  proxyReq.setHeader('Authorization', generateAuthToken());
});

We can also listen for events like errors from the proxied requests:

proxy.on('error', (err, req, res) => {
  res.writeHead(500, { 'Content-Type': 'text/plain' });
  res.end('Proxy error occurred');
});

There are many more events and options for handling advanced scenarios - refer to libraries like http-proxy and http-proxy-middleware for details.

Proxy Services & APIs

While running your own proxy servers works for small projects, dedicating time to find reliable proxies globally and handle their configuration and rotation is challenging.

This is where proxy services like Proxies API shine. We provide a cloud-based API for making proxy requests without needing to setup any servers yourself.

The service handles acquiring millions of proxies, validation, high-availability, automatic rotation, and masking settings to appear human. You can make requests through Proxies API like:

curl "<http://api.proxiesapi.com/?key=API_KEY&render=true&url=https://targetwebsite.com>"

And get back the proxied content instantly. The service even handles JS rendering and CAPTCHAs automatically before returning HTML.

For small projects, Proxies API offers a free tier so the proxy headaches disappear without any commitment.

Common Proxy Use Cases

Some common use cases where proxies help simplify web scraping projects:

Bypass IP Blocks - As mentioned at the outset, proxies allow you to route requests through many IP addresses to avoid single-IP blocks.

Overcome Captchas - Services handle CAPTCHAs automatically before requests reach your code.

Anonymity - Hide identifying info like IP region, company name, etc.

Reduce Server Load - Caching and filtering by proxies decreases load directly on internal services.

Gather Geo-Specific Data - Route through country-specific proxies to scrape region-targeted content.

Debug Client Issues - Test scenarios your dev machines can't reproduce directly due to corporate network policies.

Conclusion

Dealing with IP blocks and scrapers is a headache every web scraping developer faces. Configuring and managing proxy servers provides a programmatic means to overcome these roadblocks.

In this guide, we covered the basics of proxies in Node.js - how to setup routing through forwarding servers, tunnel HTTPS/WebSockets, write custom proxy middleware, and leverage services like Proxies API to simplify proxy handling.

If you found this useful, check out the free tier of Proxies API to supercharge your web scraping projects! Our proxy service handles all the heavy lifting so you can focus on data extraction.

Now armed with the power of proxies, happy (and uninterrupted) scraping!

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


Try ProxiesAPI for free

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
...

X

Don't leave just yet!

Enter your email below to claim your free API key: