Main Website
Scraping
Scraping
Updated on
February 9, 2024

Proxy in Puppeteer: 3 Effective Setup Methods Explained

In the article we will explain three common methods for using proxies in Puppeteer. We’ll walk you through the process of setting up and configuring proxies to enhance your web automation capabilities. But first, let’s address a fundamental question: what does “Proxy” mean in the context of Puppeteer? In Puppeteer, a proxy acts as an intermediary server that sits between your script and the target server. When you make requests through a proxy, the proxy server forwards those requests on your behalf, effectively masking your real IP address and location.

If you already have Puppeteer installed, use these links below to directly jump to the Puppeteer proxy setup of your choice:

If you don't have Puppeteer installed, use the link below to directly jump to the Puppeteer installation step:

Prerequisites of using proxies in Puppeteer

Before we jump into the world of using proxies in Puppeteer, it’s essential to ensure that you have a few prerequisites in place to streamline the setup process.

Node.js and npm

Puppeteer package

The next step is to install Puppeteer. You can do this by running the below npm command in your terminal:

The command will download and install the Puppeteer package, making it available for your Node.js projects.

Proxy server information

Since we are focusing on using proxies in Puppeteer, you’ll also need access to one or more proxy servers. Ensure that you have the necessary details for your proxy servers, including IP addresses or hostnames, port numbers, and any authentication credentials if required. This information will be crucial when configuring Puppeteer to work with proxies. If you do not have a proxy server yet, you can start with using Webshare’s proxy servers. All users get 10 free proxies to see if the setup is working for them so make sure to make use of this offer before continuing.

3 Methods of using Proxy in Puppeteer

In this section, we will guide you through the process of configuring Puppeteer to work seamlessly with proxies, exploring three common methods to achieve this. The methods explored will explain using Static, Proxy Lists and Rotating proxies. Each one has its advantages and disadvantages.

Static proxy method will use very few proxies with static IP addresses. This makes for an easy set-up for beginners or users who need a consistent proxy configuration for web automation, multiple account management or other networking needs.

Proxy list method will explain how to use proxies from a list of static IP addresses. Using a wider variety of IP addresses is great for web automation tasks that require accessing local content without the risk of getting IP banned. Thus, for app testing and small-scale web scraping it’s a great solution.

Rotating proxy method is best for users who need a large amount of proxies, usually for purposes of web automation tasks such as web scraping. Rotating proxies also help in scraping websites which have strict anti-scraping measures in place. It’s different from the Proxy list method because it uses a rotating proxy endpoint for connections instead of having to manually enter and change the IP addresses in your Puppeteer proxy setup. It’s the easiest way to scale your proxy needs in Puppeteer.

Next, we will teach you each of these methods with code examples.

Method 1: Using Static Proxies

Static proxies are fixed proxy servers with pre-defined IP addresses and port numbers. They are relatively easy to set up and a great choice when you need a consistent proxy configuration for your automation tasks.

Assuming you have installed Puppeteer, follow the below steps to get started with static proxies.

Import Puppeteer: In your Node.js script, import the Puppeteer library:

Initialize Puppeteer with Proxy: Create a Puppeteer browser instance while specifying the proxy server details. You can use either Socks5 or HTTP proxies depending on your requirements. If you are unsure, you can simply use Socks5 proxies which are suitable for the majority of Puppeteer tasks.

Replace your-proxy-ip and your-proxy-port with the actual IP address and port number of your static proxy server. For Webshare users, these will be located here in your account.

The proxy-server argument in Puppeteer’s  ‘launch’ method specifies the proxy server to use. In this case, we’re using the ‘socks5://’ scheme to indicate that we are using a Socks5 proxy. You can replace it with ‘http://’ if you are using an HTTP proxy.

To verify that Puppeteer is using the static proxy correctly, you can create a simple script to open a web page and check your IP address. Here’s an example:

The above code will open the “What is My IP” website, and you can extract and print the displayed IP address to ensure it matches the IP of your static proxy.

Method 2: Using a Proxy List

A proxy list contains multiple proxy servers that you can rotate or cycle through during your web automation tasks. This method offers more flexibility than static proxies and is particularly useful when you need to work with a variety of IP addresses.

To begin using a proxy list in Puppeteer, you need to follow the below steps:

Prepare a List of Proxies: First, you need to have a list of proxy servers. These can be in the format of IP addresses and port numbers. You can obtain such lists from proxy service providers or create your own. For Webshare users, these will be located here in your account.

Import Puppeteer: In your Node.js script, import the Puppeteer library:

Create a Proxy List: First, you should have an array called proxyList that contains a list of proxy server strings in the format IP:Port. Each element in the array represents a different proxy server you want to use. These proxies can come from different sources or providers.

Replace proxy1-ip:port, proxy2-ip:port, and proxy3-ip:port with the actual IP addresses and port numbers of the proxy servers you want to use. You can have as many proxy entries as needed in this array.

Select a Random Proxy: To add an element of randomness and avoid using the same proxy every time, you can randomly select a proxy from the proxyList array. The below Math expression generates a random index within the range of the number of proxies in your list.

Launch Puppeteer with the Chosen Proxy: Finally, when launching a Puppeteer browser instance, you can pass the --proxy-server argument with the value of randomProxy. This tells Puppeteer to use the selected proxy for all browser requests made during the session.

In this setup, we randomly select a proxy from the proxyList array using Javascript’s Math.random() function. This ensures that Puppeteer uses a different proxy for each new browser instance.

To verify that Puppeteer is using the proxy list correctly, you can run a simple test script. Here’s an example:

This script will open the “What Is My IP” website, and you can extract and print the displayed IP address to confirm that it matches the IP of the proxy server used in that particular browser instance.

Method 3: Rotating proxies

Rotating proxies provide dynamic IP addresses for each request, making them an excellent choice for scenarios where you need to frequently change your IP address to avoid IP-based restrictions or blocks.

You need to follow the below steps to set up the rotating proxies.

Import Puppeteer: In your Node.js script, import the Puppeteer library:

Initialize Puppeteer with a Rotating Proxy Endpoint: To use rotating proxies, you’ll typically need to subscribe to a rotating proxy service. One such service is Webshare that offers rotating proxy endpoints. If you have an account, you can access the rotating proxy endpoint in your account. Here’s an example of how to set up Puppeteer with Webshare’s rotating proxies:

Make sure to do the following:

  1. Replace username with your Proxy Username.
  2. Replace password with your Proxy Password.
  3. Replace p.webshare.io with the hostname of your proxy server (Domain Name field for Webshare users).

Replace port with the port number provided by your proxy provider (Proxy Port field for Webshare users).

Subscribe to a Webshare Plan: Ensure that you have an active subscription to a Webshare (or a proxy provider of your choice) plan that includes access to rotating proxies. Webshare offers access to the rotating proxy endpoint to all users without additional cost.

Launch Puppeteer: With the Puppeteer browser instance configured to use rotating proxies, you can launch Puppeteer and start performing web automation tasks.

Puppeteer, when launched with the specified --proxy-server argument, routes all web requests made by the browser through the Webshare rotating proxy server. Webshare, in turn, manages the rotation of IP addresses at the server level, providing you with different IP addresses for each request or within specific time intervals. This method ensures that you maintain a level of anonymity and avoid being blocked by websites that may restrict access based on IP addresses. It’s particularly useful for web scraping, data collection, or any task that requires frequent IP rotation.

To verify that Puppeteer is using rotating proxies correctly, you can create a simple script (as shown in the previous sections) that navigates to a website and extracts the displayed IP address. This IP address should change with each new Puppeteer session, reflecting the dynamic nature of rotating proxies.

Advanced proxy configuration

While the methods discussed earlier are suitable for most use cases, there are scenarios where advanced proxy configurations can provide greater control and flexibility. In this section, we’ll explore three advanced proxy configuration options that can address specific needs in your Puppeteer automation tasks.

Custom IP per page

In certain situations, you may require a unique proxy IP address for each page or navigation within your Puppeteer script. This need arises when you want to simulate multiple users or sessions with different IP addresses, or when you want to avoid IP rate limits on certain websites. The puppeteer-page-proxy library offers a solution for precisely this scenario. It allows you to assign unique proxy configurations to individual pages within your Puppeteer script, giving you granular control over IP address used for each web interaction.

To get started, you need to install the puppeteer-page-proxy library using npm:

In your Puppeteer script, import the puppeteer-page-proxy library as shown below:

Initialize your Puppeteer browser and create one or more page instances as needed.

Utilize the puppeteerPageProxy function to set a custom proxy for each page individually. For example:

Replace http://proxy1-ip:port and http://proxy2-ip:port with the actual proxy server details you want to use for each page.

Setting up proxy-chain

For highly complex proxy setups, such as chaining multiple proxies together, the proxy-chain library provides the necessary functionalities such as:

  • Multi-layer Proxy Chains: When you need to route your requests through multiple proxy layers, this library can be useful for obscuring your origin and enhancing anonymity.
  • Proxy Tunneling: The library provides additional layers of security to tunnel requests through intermediate proxies before reaching the final target.

Install the proxy-chain library using npm as shown below:

You can implement a custom proxy chaining solution using proxy-chain in your Puppeteer script, configuring it according to your specific use case. Detailed setup can vary significantly based on the complexity of your proxy chain. Here’s an example of a simple proxy chaining configuration to help you get started.

Example: Simple Proxy Chain

The below example demonstrates how to set up a basic proxy chain using proxy-chain and Puppeteer. We create a local proxy server and route Puppeteer’s requests through it.

In the above code, we import the proxy-chain library to create a local proxy server on port 8000 within an asynchronous function. We define the proxyUrl that serves as our local proxy server’s URL. Puppeteer is launched with the --proxy-server argument set to proxyUrl, directing all its requests through the local proxy. After Puppeteer operations conclude, the browser instance is closed and the local proxy server is stopped using await server.close()

Fixing common issues

While using Puppeteer for web automation, you may encounter common issues related to the setUserAgent command. These issues can affect how websites respond to your requests and may require specific solutions. Below are some of the typical problems that need to be addressed.

Website not rendering correctly

Issue: After setting a custom user agent with setUserAgent, you notice that certain websites do not render correctly or display expected content.

Solution: Some websites rely heavily on user agent strings to serve content correctly. If your custom user agent does not match the expected format, the website may not respond as intended. To resolve this issue, do the following:

  • Ensure that your custom user agent closely mimics a common web browser’s user agent string, such as one from Firefox, Chrome, or Safari. We have written an in-depth guide at configuring user agents in Puppeteer for exactly this purpose.
  • Check the website’s requirements or documentation for user agent recommendations and adjust your custom user agent accordingly.

Website Blocking Puppeteer

Issue: Websites actively block or restrict access from Puppeteer, even after setting a custom user agent.

Solution: Some websites employ sophisticated mechanisms to detect and block headless browsers like Puppeteer. To bypass such blocks:

  • Use rotating IP addresses, as discussed earlier in the article, to make your requests appear to come from different sources.
  • Explore the use of other Puppeteer options like headless: false to run in a non-headless mode, which may help circumvent detection.

Website Behavior Not as Expected

Issue: Websites do not behave as expected, even with a custom user agent.

Solution: Sometimes, issues can be more complex than just the user agent. Consider the following additional steps:

  • Inspect the website’s source code and network requests using Puppeteer developer tools (page.evaluate and page.on('response')). This can help identify any specific issues or requirements.
  • Check for Javascript errors or console logs in the Puppeteer browser instance that might provide clues about issues on the website.
  • Experiment with different combinations of user agents, proxy settings, and other Puppeteer options to find the optimal configuration for the target website.

Conclusion

In this article, we’ve explored how to leverage Puppeteer’s capabilities alongside proxies for efficient web automation. Puppeteer, a powerful tool developed by Google, enables seamless interactions with web pages, making it invaluable for web scraping, testing, and more.

Your next step could be configuring the actions of the Puppeteer client that you want to do and managing clicking page elements could certainly be your next step in the journey of learning how to use Puppeteer. Learn more about Clicking in Puppeteer.

We’ve covered three primary methods for using proxies with Puppeteer, from static proxies for consistency to proxy lists and rotating proxies for flexibility. Additionally, we discussed advanced proxy configurations and how to troubleshoot common issues. Whether you are a beginner or an experienced developer, the insights shared will help you succeed in using Proxies in Puppeteer.

Related Articles

Using Puppeteer on AWS Lambda for Scraping

Scroll in Puppeteer: Scroll to Bottom, Top, or Into View

5 Puppeteer Alternatives For Scraping & Application Testing