In the beginning stages of a web crawling project or when you have to scale it to only a few hundred requests, you might want a simple proxy rotator that uses the free proxy pools available on the internet to populate itself now and then.
We can use a website like https://sslproxies.org/ to fetch public proxies every few minutes and use them in our Ruby projects.
This is what the site looks like:
And if you check the HTML using the inspect tool, you will see the full content is encapsulated in a table with the id proxylisttable
The IP and port are the first and second elements in each row.
We can use the following code to select the table and its rows to iterate on and further pull out the first and second elements of the elements.
Fetching the Proxies
First, we'll need to install the
gem install nokogiri
Then we can use
require 'nokogiri'
require 'open-uri'
url = '<https://sslproxies.org/>'
html = open(url).read
doc = Nokogiri::HTML(html)
This fetches the HTML from sslproxies.org and parses it with Nokogiri.
Extracting the Proxies
Now we can use Nokogiri to extract the proxies from the HTML. The proxies are contained in a table with id
We can select this table, then loop through each row, grabbing the first and second table cell which contain the IP and port:
proxies = []
doc.css('#proxylisttable tr').each do |row|
ip = row.css('td')[0].text
port = row.css('td')[1].text
proxies << {ip: ip, port: port}
end
This selects the table, loops through the rows, and extracts the IP and port into a hash that is pushed into the
Fetching a Random Proxy
To fetch a random proxy from the array, we can use Ruby's
proxies.sample
This will return a random proxy hash from the array.
Putting It Together
Let's put this all together into a method that fetches proxies and returns a random one:
require 'nokogiri'
require 'open-uri'
def get_random_proxy
url = '<https://sslproxies.org/>'
html = open(url).read
doc = Nokogiri::HTML(html)
proxies = []
doc.css('#proxylisttable tr').each do |row|
ip = row.css('td')[0].text
port = row.css('td')[1].text
proxies << {ip: ip, port: port}
end
proxies.sample
end
To fetch a random proxy:
proxy = get_random_proxy
puts proxy[:ip]
puts proxy[:port]
This will print out a random IP and port from the fetched proxies.
You can call this method every few minutes to get fresh proxies.
Using the Proxy in Code
To use the random proxy in other code, you can set environment variables:
proxy = get_random_proxy
ENV['HTTP_PROXY'] = "http://#{proxy[:ip]}:#{proxy[:port]}"
ENV['HTTPS_PROXY'] = "http://#{proxy[:ip]}:#{proxy[:port]}"
Then any HTTP requests will be proxied through the random IP.
Full Code
Here is the full code for easy copy/pasting:
require 'nokogiri'
require 'open-uri'
def get_random_proxy
url = '<https://sslproxies.org/>'
html = open(url).read
doc = Nokogiri::HTML(html)
proxies = []
doc.css('#proxylisttable tr').each do |row|
ip = row.css('td')[0].text
port = row.css('td')[1].text
proxies << {ip: ip, port: port}
end
proxies.sample
end
proxy = get_random_proxy
puts proxy[:ip]
puts proxy[:port]
ENV['HTTP_PROXY'] = "http://#{proxy[:ip]}:#{proxy[:port]}"
ENV['HTTPS_PROXY'] = "http://#{proxy[:ip]}:#{proxy[:port]}"
This provides a simple proxy rotator in Ruby using Nokogiri and free proxy lists.
If you want to use this in production and want to scale to thousands of links, then you will find that many free proxies won't hold up under the speed and reliability requirements. In this scenario, using a rotating proxy service to rotate IPs is almost a must.
Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.
Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.
Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.
A simple API can access the whole thing like below in any programming language.
curl "<http://api.proxiesapi.com/?key=API_KEY&url=https://example.com>"
We have a running offer of 1000 API calls completely free. Register and get your free API Key here.