In the beginning stages of a web crawling project or when you have to scale it to only a few hundred requests, you might want a simple proxy rotator that uses the free proxy pools available on the internet to populate itself now and then.
We can use a website like https://sslproxies.org/ to fetch public proxies every few minutes and use them in our Perl projects.
This is what the site looks like:
And if you check the HTML using the inspect tool, you will see the full content is encapsulated in a table with the id proxylisttable
The IP and port are the first and second elements in each row.
We can use the following code to select the table and its rows to iterate on and further pull out the first and second elements of the elements.
Mojo::UserAgent makes it easy to fetch and parse web pages in Perl.
First, install Mojo::UserAgent:
cpan Mojo::UserAgent
Then we can fetch the proxy list page:
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = '<https://sslproxies.org/>';
my $tx = $ua->get($url);
my $response = $tx->res->dom;
This will fetch the HTML from sslproxies.org and parse it into a Mojo::DOM object we can query.
The proxy IP and port are in the first and second columns of each row in the table with id "proxylisttable".
We can use CSS selectors to extract them:
my @proxies;
for my $row ($response->find('#proxylisttable tr')->each) {
my ($ip, $port) = $row->find('td')->map('text')->list;
push @proxies, {
ip => $ip,
port => $port
};
}
Now let's wrap this in a function we can call periodically to refresh the proxies:
sub load_proxies {
my $ua = Mojo::UserAgent->new;
my $url = '<https://sslproxies.org/>';
my $tx = $ua->get($url);
my $response = $tx->res->dom;
my @proxies;
for my $row ($response->find('#proxylisttable tr')->each) {
my ($ip, $port) = $row->find('td')->map('text')->list;
push @proxies, {
ip => $ip,
port => $port
};
}
return @proxies;
}
To fetch a random proxy:
my @proxies = load_proxies();
my $random_idx = int rand @proxies;
my $random_proxy = $proxies[$random_idx];
print "$random_proxy->{ip}:$random_proxy->{port}\\n";
You can call
For using the proxies, a module like
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my @proxies = load_proxies();
my $random_proxy = $proxies[int rand @proxies];
$ua->proxy('http', "http://$random_proxy->{ip}:$random_proxy->{port}");
my $response = $ua->get('<http://example.com>');
This will make the request through a random proxy from the list.
So in summary:
This provides a simple but effective Perl proxy rotator for web scraping and crawling projects.
If you want to use this in production and want to scale to thousands of links, then you will find that many free proxies won't hold up under the speed and reliability requirements. In this scenario, using a rotating proxy service to rotate IPs is almost a must.
Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.
Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.
Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.
A simple API can access the whole thing like below in any programming language.
curl "<http://api.proxiesapi.com/?key=API_KEY&url=https://example.com>"
We have a running offer of 1000 API calls completely free. Register and get your free API Key here.