Waiting for a page to fully load is a fundamental skill every Puppeteer developer and website automation engineer should learn. An essential tool to achieve this synchronization is the waitUntil option in the page.goto() method. In Puppeteer, the page.goto() method primarily navigates to a new URL, and its waitUntil option allows the automation engineer to define conditions for Puppeteer to wait for before considering the navigation complete and proceeding with further actions.
There are four values you can use for the waitUntil parameter. In this article, we will compare these four values and explain how and when to use each based on your use case.
In Puppeteer, you can wait for a web page to load using the page.goto() method. Here's how you can use the page.goto() method to wait for a page to load.
In the code snippet above, we launch Puppeteer and open a new tab in the browser instance. We then use the page.goto() method to instruct the Puppeteer-controlled browser to go to 'https://google.com'. In this case, Puppeteer will wait until a load event is fired before moving to the next code line.
Note that you can replace 'https://google.com' with the URL of any other website you'd like to load.
Finally, we close the Puppeteer browser instance when we’re done with it.
Let’s understand the Page.goto() method further. Its basic syntax is page.goto(url, [options]). As you can see, it has two parameters.
In the code example provided earlier, we didn't specify the waitUntil option. In such cases, Puppeteer defaults to using the waitUntil option with the value of 'load', waiting for the load event of the page to be fired. This means it waits until all resources, including scripts, stylesheets, and images, have been fully loaded.
In our previous example, we didn't specify a value for the waitUntil parameter. However, there are four values you can assign to waitUntil. Understanding these options will help you make the best use of the waitUntil option. In this article, we will discuss these four options.
The first option we'll learn involves using the load value with the waitUntil option. Here's how our previous example would look when specifically using the load value.
In this revised code, by setting waitUntil to 'load', we instruct Puppeteer to wait for the load event of the page to be fired before proceeding. This means Puppeteer will ensure that all resources, including scripts, stylesheets, and images, have been fully loaded before moving on to the next line of code. Once the page has fully loaded, you might want to interact with certain elements, such as clicking on buttons. Learn more about how to do this with the Puppeteer Click guide.
Another value you can assign to waitUntil is domcontentloaded. This option instructs Puppeteer to wait until the DOM content has been fully loaded before proceeding, which typically happens before the full page load event.
Here's how our example would look when using the domcontentloaded value.
When you have assigned the domcontentloaded value, Puppeteer will wait until the initial HTML document has been completely loaded and parsed. However, it won’t wait for stylesheets, images, and subframes to finish loading. This can often be faster than waiting for the full load event. This option is suitable for scenarios where you only need to access or interact with the DOM elements immediately after they become available. If you're looking to wait for specific DOM elements to appear or become available for interaction, you can also use Puppeteer's Waitforselector.
Transitioning from DOM-focused events, let’s turn our attention to network-related triggers. One of the more interesting options for the waitUntil parameter is networkidle0. This value directs Puppeteer to wait until there are no network connections for at least 500 ms. The purpose of using this option is to ensure that most, if not all, network activity has settled.
Here's how our example would be structured when employing the networkidle0 value.
When you use the networkidle0 option, Puppeteer basically waits until there's no more network activity going on. Think of it as waiting for the page to stop from making its data requests, especially those that might happen after the main content loads. This is useful if you're dealing with pages that pull in extra data on the side, like with some web scraping tasks. You'd want to make sure you grab everything, even the stuff that loads a bit later.
The next option to discuss is the networkidle2 value for the waitUntil parameter. Let's revisit our example, this time specifying the networkidle2 value.
In the modified code, by assigning waitUntil to 'networkidle2', we guide Puppeteer to wait until there are no more than 2 network connections for at least 500 ms. This is particularly beneficial for pages that have ongoing minor network activities, which might not cease entirely but do diminish to a minimum level. The networkidle2 option allows us to capture the majority of page content without getting held up by these minimal, persistent network connections.
Sometimes, one waitUntil condition isn't enough. There might be scenarios where you want to ensure multiple conditions are met before proceeding. Fortunately, Puppeteer allows us to combine multiple waitUntil events for more fine-grained control. Let's see how our example would be adjusted to use both the domcontentloaded and networkidle2 events.
In the updated code, by setting waitUntil to both 'domcontentloaded' and 'networkidle2', we're telling Puppeteer to wait for both conditions to be satisfied. The browser will ensure that the DOM content has fully loaded (because of domcontentloaded) and that there are no more than 2 network connections for at least 500 ms (because of networkidle2). This combination provides a more comprehensive wait strategy and reliable page loading mechanism.
Understanding how to use waitUntil to wait for a web page to load is an important skill for anyone involved in website automation testing or web scraping. In this article, we explored four commonly-used methods: load, domcontentloaded, networkidle0, and networkidle2 to determine when the page is ready for interaction.
To determine which method best suits your needs, it's essential first to understand your project's requirements and the characteristics of the web page you're working with. For complex pages that load dynamic content and resources, networkidle2 might be the ideal choice. However, always weigh the trade-offs between waiting time and ensuring the page is fully ready for interaction.