In the realm of web scraping and automated browser testing, Puppeteer stands out as a powerful tool for controlling headless Chrome. One of its key features is the ability to execute JavaScript code directly within the context of a web page, enabling developers to interact dynamically with elements, manipulate data and extract valuable insights.
Let’s discuss the step-by-step process of setting up Puppeteer, navigating to a web page and integrating JavaScript execution to enhance your automation tasks.
Before diving into Puppeteer’s capabilities, you need to set it up in your development environment. Fortunately, Puppeteer offers easy installation via npm (Node Package Manager), making it accessible to developers across various platforms. Simply run the following command in your terminal to install Puppeteer:
You need to navigate to the desired URL in order to execute JavaScript on a webpage with Puppeteer. This can be achieved using Puppeteer’s page.goto() method, which loads a given URL in the browser’s tab. Here’s a basic example of how to navigate to a web page using Puppeteer:
You can execute JavaScript code using Puppeteer’s page.evaluate() method that evaluates the provided function within the context of the page. Here’s how you can execute JavaScript code on a page with Puppeteer:
In this snippet, Puppeteer evaluates the provided function on the page and returns the title of the web page.
In this section, we’ll demonstrate how to leverage Puppeteer’s capabilities to execute JavaScript for scraping data on a book-selling website.
First, we initiate Puppeteer, launch a new browser instance, create a new page and navigate to the website https://books.toscrape.com/.
Now, we execute custom JavaScript code within the context of the webpage to extract book titles, prices and availability.
Through the use of document.querySelectorAll(), we target specific elements on the page, such as book titles nested within <h3> tags with the class .product_pod, prices marked with the class .price_color, and availability information indicated by the class .availability.
In this step, we merge the extracted book titles, prices and availability into an array of objects. Using the map() method, each book’s information is paired together into a single object within the booksData array.
Lastly, we write the extracted data to a JSON file and close the Puppeteer browser instance.
Here’s how the output looks like:
In this article, we explored the capabilities of Puppeteer in executing JavaScript on web pages, focusing on the example of extracting data from https://books.toscrape.com/. We demonstrated how Puppeteer enables navigation through web pages, execution of JavaScript to extract targeted information such as book titles, prices and availability, and finally, storing the scraped data in JSON format.
End-to-End Testing with Puppeteer: Getting Started
How to scrape websites using Puppeteer?
Get Element in Puppeteer: Mastering Class, ID and Text Methods