When optimizing your Puppeteer setup, focusing on strategies to speed up Puppeteer execution is crucial.
One of the key methods to make Puppeteer faster is leveraging the --user-data-dir argument, which allows Puppeteer to use local cache, thereby significantly speeding up repetitive tasks by reusing downloaded files from previous sessions.
Additionally, to further enhance Puppeteer performance, it’s important to implement a single browser launch strategy. Rather than launching a new browser instance for every request, maintaining an open instance can drastically reduce the time spent on initiating new sessions, although it’s essential to monitor memory usage to prevent memory leaks, a common issue if the browser isn’t periodically refreshed. Moreover, injecting HTML directly using page.setContent can notably accelerate processing by avoiding the latency associated with fetching remote content, a vital tip for those aiming to make Puppeteer operations as fast as possible.
Here the three most important tips we’ve learned along the way while building our API infrastructure based on Puppeteer to speed up the execution of Chromium browser using Puppeteer library.
Use Local Cache
Browser Single Launch
Inject HTML
Let Chromium to reusing the already downloaded files of previous sessions by setting the --user-data-dir
argument when launcing the browser. This way images and fonts will be served much quicker:
'--user-data-dir=./tmpusr'
Of course, this tip works well as much as you reuse same resources across instance calls. If your use-case is about fetching always different websites, this helps very little.
Reuse the same Browser instance instead of lauching it for every session. Launch it on every request is significantly slow:
// keep this reference for next requests
browser = await puppeteer.launch({...})
// so, don't close it with browser.close()
Reusing the same browser instance is a smart tip, be aware that you have to close it and reopen a fresh browser after a while since it tends to memory leak after a number of uses (depending of your machine, this number can be 20-50 sessions).
Instead of loading from remote the page you want to process, inject it with page.setContent
because it’s faster:
await page.setContent(html)
This is trivial, fetching remote content is significantly slower than using local stuff. Of course, this can be exploited only if your use-case allows to use in-memory pre-loaded assets of assets on local disk.
These approaches, from managing memory with Puppeteer to understanding why Puppeteer can sometimes be slow, are key to optimizing your browser automation tasks effectively.