What is Puppeteer?
In a nutshell, Puppeteer is a Node.js library that provides an API for controlling headless Chrome.
I know that sounds kinda geeky, but here's what it really means:
- It lets you automate everything that you'd normally do manually in a browser - click buttons, fill forms, scroll pages, press keys etc.
- It runs everything inside a headless Chrome instance i.e. there's no actual browser UI. It all happens silently inside the background.
This makes it super fast and ideal for web scraping/automation tasks!
And as I discovered in my early trials and tribulations, Puppeteer also makes capturing web screenshots simple, flexible and reliable.
Let me walk you through it step-by-step...
Getting Setup with Puppeteer
I'll assume you already have Node.js installed on your machine. If not, grab the latest LTS release for your OS here first.
The good news is installing Puppeteer takes just one command! πͺ
npm install puppeteer
This downloads the Puppeteer package and a bundled Chromium binary that it will use under the hoods.
You can also install Puppeteer globally with the -g flag, but local installs are considered a best practice. If you run into permission issues though on Mac/Linux, you may need elevated privileges:
sudo npm install -g puppeteer
Once the install finishes, you're all setup for screenshot magic!
Let's start with a basic example...
Taking Your First Puppeteer Screenshot
Create a new file and import Puppeteer:
const puppeteer = require('puppeteer');
Then add the following async script:
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('<https://www.example.com>');
await page.screenshot({path: 'example.png'});
browser.close();
})();
Breaking this down:
Finally run the script:
node script.js
And voila! You should now have a basic screenshot saved to your filesystem. π₯³
Now while this works, the output is less than stellar. The screenshot only captures the viewable part of the page based on your default screen size.
Let's fix that next...
Capturing Full Page Screenshots with Puppeteer
To take complete screenshots that scroll the full page height, we need to add
await page.screenshot({path: 'example.png', fullPage: true});
This scrolls down the entire length of the page, stitching together a long continuous screenshot.
Depending on page height and content, full page screenshots take slightly longer than regular ones as they have more work to do behind the scenes!
We can also set the explicit viewport size upfront using
await page.setViewport({width: 1280, height: 1600})
This makes sure the page renders correctly on bigger screens before capturing.
Pro Tip: Use wider viewports for readability and set heights taller than the page height to enable vertical scrolling.
Here's our updated script:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({width: 1280, height: 1600});
await page.goto('<https://www.example.com>');
await page.screenshot({path: 'example.png', fullPage: true});
await browser.close();
})();
Give this a whirl on a long web page and you'll now get a nice, clean full page vertical screenshot!
Let's keep exploring what else we can do...
Taking Screenshots of Specific HTML Elements
Sometimes you may not need the entire page screenshot. Instead, you may want to capture only a particular section, div or element on the page.
Puppeteer makes this easy with element selectors.
For this example, let's say we want to screenshot just the navigation bar. The selector copied from the inspect element tool is:
#main-nav
We can use
const navbar = await page.$('#main-nav');
await navbar.screenshot({path: 'navbar.png'});
The same selector concepts apply as with DOM manipulation in the browser or by libraries like jQuery.
You can use classes, ids, attribute selectors, or even advanced CSS combinators to isolate specific components on the page!
Pro Tip: Combine full page and element screenshots to create partial "slices" of pages!
Let's see another useful screenshot scenario...
Capturing Multiple Screenshots in a Loop
A common real-world requirement is needing to screenshot numerous links or pages in one go.
Say for example, you want to capture screenshots of the first 10 search results for a Google query.
Here's how to loop through the links and save screenshots with dynamic file names:
const links = await page.$$('.search-results a');
for(let i = 0; i < 10; i++) {
await links[i].click();
await page.screenshot({path: `result-${i}.png`});
page.goBack();
}
We use
We iterate through the first 10, clicking each link, taking a screenshot, then navigate back to return to the results.
This saves the screenshots with file names like result-1.png, result-2.png etc in a programmatic way.
The same approach allows batch screenshotting categories, filtered image grids, product listings, galleries, menus and almost anything!
Pro Tip: Wrap this in an async function to make the script reusable. Pass the base URL, selector and filenames as arguments.
This covers the main screenshot use cases, but we've still barely scratched the surface of Puppeteer's true powers.
Stick with me...things are about to get even more exciting! π
Unleashing Puppeteerβs Advanced Superpowers
While taking basic screenshots is incredibly useful, Puppeteerβs real magic lies in more advanced browser automation scenarios.
Letβs explore some of these advanced tricks thatβll make you a certified Puppeteer power user!
Automating Login and Auth Workflows
Sites these days have gotten really restrictive with login requirements to access certain pages and content.
Puppeteer helps you automate filling forms, entering data and bypassing login screens seamlessly.
For example, logging into a site can be broken down into easy steps like:
// Navigate to login URL
await page.goto('/login');
// Enter username from credentials file
await page.type('#username', username);
// Enter password
await page.type('#password', password);
// Click submit button
await page.click('.submit');
// Confirm we got past auth wall
if (page.url().includes('/dashboard')) {
// We're in! π
} else {
// Invalid login, handle error
}
Pro Tip: Save logins in a .env file and load them using
Emulate Mobile Devices and Screen Sizes
Modern responsive sites adapt UI and layouts for mobile vs desktop experiences.
To test these, we can use
// Emulate iPhone 12
await page.emulate(iPhone);
// Custom sizes
await page.emulate({viewport: {width: 400, height: 800}});
We can then take screenshots to confirm UI behavior on varying screen real estate.
This helps catch missing elements, overflow issues, tiny tap targets etc.
Simulate Network Speeds and Conditions
To build truly resilient apps, we need to test bad network scenarios.
Puppeteer allows throttling CPU, network and even going offline:
// Simulate Slow 3G
await page.emulate({
networkConditions: {
offline: false,
downloadThroughput: 750 * 1024 / 8,
uploadThroughput: 250 * 1024 / 8,
latency: 400 // ms
}
});
// Test offline behavior
await page.emulate({offline: true});
We can take screenshots after introducing these constraints to judge impacts on page load speeds, image optimisations, fallbacks etc.
Battle-hardening our apps for the real-world! πͺ
...and this merely scratches the surface of next-level techniques I've picked up from thousands of hours scraping with Puppeteer!
But before I got here, I first had to overcome a series of roadblocks. Let's talk troubleshooting...
Debugging Common Puppeteer Issues
As a grizzled veteran now, I can proudly say I've made every silly Puppeteer mistake imaginable!
Let me quickly share solutions to some frequent gotchas I see users run into:
Problem: Blank/Empty Screenshots
Ah, the dreaded blank screenshot. This is mostly often caused by...
Solutions:
// Wait for document body
await page.waitFor('body');
// Network idle event
await page.goto(url, {waitUntil: 'networkidle0'});
// Close browser after screenshot promise resolves
const data = await page.screenshot();
await browser.close();
Problem: Timeouts and Hangs
Asynchronous Puppeteer scripts involve lots of waiting. So if one step lags, the whole script may time out or hang.
Solution:
// Bump default timeout from 30s if needed
browser.defaultViewport({timeout: 0});
// Catch errors to handle timeouts gracefully
try {
await page.click();
} catch {
// React to timeouts
}
Problem: Page Loads Partially/Incompletely
If shots miss elements or pages look half loaded, we likely have...
Solutions:
// Wait for page load event
await page.waitForNavigation();
// Network idle + DOMContentLoaded
await page.goto(url, {waitUntil: ['domcontentloaded', 'networkidle0']});
// Delay after page loads fully
await page.waitFor(500); // 0.5 sec
These are just a sampling of common issues that once plagued me endlessly!
Learning the root causes and fixes the hard way so you don't have to. π
Let's round things off with some final expert level tricks...
Expert Puppeteer Tips and Tricks π‘
Over the years, I've compiled a handy bag of Puppeteer pro tips that I frequently use in my web scraping projects.
Let me share some of my secret weapons to take your Puppeteer skills to the next level:
Speed Up Slow Page Loads
Sites overloaded with ads/trackers bring browsers to a crawl. We can intercept requests and clean these up by blocking unnecessary domains:
await page.setRequestInterception(true);
page.on('request', request => {
if (request.resourceType() == 'image') {
request.abort();
} else {
request.continue();
}
});
This keeps page loads lightning fast by stripping heavyweight elements before they even start downloading!
Automatically Scroll Pages
Scrolling long pages with dynamic content can be clunky. We can auto-scroll instead using Puppeteer's built-in smooth scrolling capabilities:
await page.evaluate(() => {
window.scrollBy(0, window.innerHeight);
});
Just wrap this in a loop to scroll several page lengths at once!
Craft Flexible Reusable Utils
Repeating same steps across scripts? Modularize common logic into separate utils files that can be reused or published as packages:
// utils.js
async function getScreenshot(page, path) {
await autoScroll(page);
return page.screenshot({path});
}
// index.js
const { getScreenshot } = require('./utils');
getScreenshot(page, 'file.png');
Build your own Puppeteer toolkit over time! π§°
CI Integrations and Automation
Puppeteer shines when integrated into CI/CD pipelines for automation. Capture screenshots on a schedule or on-demand using services like AWS Lambda:
ββββββββββββββ minute (0 - 59)
β ββββββββββββββ hour (0 - 23)
β β ββββββββββββββ day of month (1 - 31)
β β β ββββββββββββββ month (1 - 12)
β β β β ββββββββββββββ day of week (0 - 6) (Sunday to Saturday;
β β β β β 7 is also Sunday on some systems)
β β β β β
β β β β β
* * * * * /opt/screenshot
This runs the script every hour fetching fresh screenshots! β°