Captchas are a necessary evil on many websites. They help prevent bots from abusing services, but also create headaches for legitimate automation. Thankfully, there are options for bypassing captchas programmatically. This article will cover solving captchas using Puppeteer and headless Chrome.
Overview
Puppeteer provides a Node API for controlling headless Chrome. It allows you to navigate pages, interact with elements, run JavaScript, and more.
To bypass captchas, we'll use Puppeteer to:
This will allow us to automate pages that are normally protected by captchas.
Extracting Captcha Info
The first step is navigating to the page and extracting the info needed to solve the captcha. For example, with Google's reCAPTCHA we need the site key.
To get the site key, we can view the page source and search for it:
const siteKey = await page.evaluate(() => {
return document.querySelector('#recaptcha script[type="text/javascript"]')
.innerHTML.match(/sitekey: '(.*?)'/)[1];
});
This grabs the reCAPTCHA script from the page and extracts the site key using a regex.
Sending Captcha Request
Next we need to send the site key to a captcha solving service to process the challenge. For this example we'll use 2Captcha.
We send a request containing the site key, our API key, and other details:
const formData = {
method: 'userrecaptcha',
googlekey: siteKey,
key: apiKey,
pageurl: pageUrl,
json: 1
}
const response = await request.post('<http://2captcha.com/in.php>', {form: formData});
const requestId = JSON.parse(response).request;
This initializes the captcha solving request and returns a request ID we can use to poll for the solution.
Polling for Solution
Now we need to continually poll the service to check if the captcha is solved:
const response = await poll({
taskFn: requestCaptchaResults,
interval: 1500,
retries: 30
});
function requestCaptchaResults(apiKey, requestId) {
//...make request and return promise
}
We use the
Submitting the Solution
Finally, we need to submit the captcha solution to the page. For reCAPTCHA, this involves entering it into a hidden text field:
await page.evaluate(solution => {
document.getElementById('g-recaptcha-response').innerHTML = solution;
}, response);
And that's it! The captcha is now solved. We can submit the form and continue automation.
Here is a full code example for solving reCAPTCHAs with Puppeteer using the 2Captcha service:
const puppeteer = require('puppeteer');
const request = require('request-promise-native');
const poll = require('promise-poller').default;
const apiKey = 'YOUR_API_KEY';
const siteDetails = {
sitekey: 'SITE_KEY',
pageurl: '<https://www.example.com>'
};
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(siteDetails.pageurl);
const requestId = await initiateCaptchaRequest(apiKey, siteDetails);
const response = await pollForRequestResults(apiKey, requestId);
await page.evaluate(resp => {
document.getElementById('g-recaptcha-response').innerHTML = resp;
}, response);
await page.click('#submit');
await browser.close();
})();
async function initiateCaptchaRequest(apiKey, siteDetails) {
const formData = {
method: 'userrecaptcha',
googlekey: siteDetails.sitekey,
key: apiKey,
pageurl: siteDetails.pageurl,
json: 1
};
const response = await request.post('<http://2captcha.com/in.php>', {form: formData});
return JSON.parse(response).request;
}
async function pollForRequestResults(apiKey, requestId, retries=30, delay=15) {
await timeout(delay * 1000);
return poll({
taskFn: requestCaptchaResults(apiKey, requestId),
interval: 1500,
retries
});
}
function requestCaptchaResults(apiKey, requestId) {
const url = `http://2captcha.com/res.php?key=${apiKey}&action=get&id=${requestId}&json=1`;
return async function() {
const resp = await request.get(url);
const json = JSON.parse(resp);
if(json.status === 0) throw Error('captcha not ready');
return json.request;
}
}
function timeout(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
This implements the overall flow:
The key aspects are extracting the site key, sending it to 2Captcha, polling for the response, and submitting that back to the page.
You would need to update the API key, site key, and page URL to match your specific use case. But this provides a full working example of solving CAPTCHAs with Puppeteer.
Conclusion
Bypassing captchas with Puppeteer provides a programatic way to automate through these protections. It does require using an external solving service, which has associated costs. But overall it's a simple and effective technique for controlling pages that make use of captcha defenses.
Rather than building and managing your own captcha solving infrastructure, services like Proxies API handle all of this complexity for you.
With Proxies API, you make a simple API request with the target URL. It will handle:
And return the rendered HTML. No need to orchestrate the numerous steps required for reliable captcha solving.
For example:
curl "http://api.proxiesapi.com/?key=API_KEY&render=true&url=https://targetpage.com"
This takes care of all the headaches of automation. No proxies, browsers, or captcha solving services to manage.
Proxies API offers 1000 free API calls to get started. Check it out if you need to integrate robust captcha solving and proxy rotation in your projects.