Main Website
Scraping
Updated on
February 6, 2025

How to Bypass CAPTCHA with Playwright Automatically

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a common security mechanism designed to block automated bots from accessing websites. In Playwright, a popular end-to-end testing library, encountering CAPTCHA can be a roadblock for automation. Websites use CAPTCHA to differentiate real users from bots by presenting challenges like image recognition or text entry. However, automating CAPTCHA handling in Playwright can save significant time and effort when testing or scraping. In this article, we'll guide you through the steps to bypass CAPTCHA automatically with Playwright, ensuring a smooth automation workflow.

How to automatically bypass CAPTCHA with Playwright? →

Prerequisites

To bypass CAPTCHA with Playwright effectively, ensure you have the following setup:

  • Install Python: Download and install Python (preferably version 3.7 or higher) from the Python website. Ensure the pip package manager is also installed.
  • Set Up a Virtual Environment (Optional): It’s good practice to create a virtual environment to manage dependencies. Run the following commands:
python -m venv playwright-captcha-env  
source playwright-captcha-env/bin/activate  # Use `.\playwright-captcha-env\Scripts\activate` on Windows  
  • Install Playwright for Python: Install the Playwright package along with its necessary dependencies:
pip install playwright  
python -m playwright install
  • CAPTCHA-Solving Service: Playwright alone cannot solve CAPTCHAs. You’ll need a third-party service like 2Captcha or Anti-Captcha. Obtain an API key by registering on one of these platforms.
  • Proxy Service (Optional): To minimize CAPTCHA triggers caused by IP bans or flagged behavior, use a proxy service like Webshare. You can get started for free using our Proxy in Playwright guide.
  • Install Additional Python Libraries: For integrating CAPTCHA-solving services, you may need libraries like 2captcha-python. Install using the below command:
pip install 2captcha-python 

How to automatically bypass CAPTCHA with Playwright?

Follow the steps below to bypass CAPTCHA automatically using Playwright with the 2captcha service:

Step 1: Import Libraries and Initialize Playwright

Import the required libraries, including Playwright for browser automation and 2Captcha to solve CAPTCHA challenges:

from playwright.sync_api import sync_playwright
from twocaptcha import TwoCaptcha

Step 2: Set Up the Browser and Solver

Initialize Playwright to launch a browser and create a new page. Also, instantiate the 2Captcha solver with your API key:

url = "https://patrickhlauke.github.io/recaptcha/"  # Target URL with reCAPTCHA

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)  # Launch browser in headless mode
    page = browser.new_page()  # Create a new page
    solver = TwoCaptcha("<YOUR_API_KEY>")  # Initialize 2Captcha solver with your API key

Step 3: Navigate to the CAPTCHA Page

Navigate to the URL containing the CAPTCHA and locate the iFrame element that holds the CAPTCHA box. Switch to the iFrame and extract the CAPTCHA site key:

 page.goto(url)  # Open the target URL

    # Obtain the iFrame containing the CAPTCHA box
    captcha_frame = page.wait_for_selector("iframe[src*='recaptcha']")

    # Switch to the content of the CAPTCHA iframe
    captcha_frame_content = captcha_frame.content_frame()

    # Extract the site key from the CAPTCHA iframe
    site_key = captcha_frame.get_attribute("src").split("k=")[-1].split("&")[0]

    # Get the CAPTCHA checkbox element
    captcha_checkbox = captcha_frame_content.wait_for_selector("#recaptcha-anchor")

    # Click the CAPTCHA checkbox to start the challenge
    captcha_checkbox.click()

Step 4: Solve the CAPTCHA

Use the 2Captcha service to solve the CAPTCHA by passing the extracted site key. Retrieve the solution and input it into the hidden CAPTCHA response field:

  # Solve the CAPTCHA using 2Captcha
    captcha_response = solver.recaptcha(sitekey=site_key, url=url)
    
    # Extract the CAPTCHA response token from the result
    captcha_token = captcha_response["code"]

    if captcha_response:
        # Fill the solved CAPTCHA token into the response field
        input = page.evaluate(
            f'document.querySelector("#g-recaptcha-response").value="{captcha_token}"'
        )

        # Print the input value to confirm the token
        print(input)

        # Take a screenshot of the page for verification
        page.screenshot(path="screengrab.png")

Step 5: Submit the Form and Close the Browser

After filling in the CAPTCHA token, proceed with further actions (like form submission) and close the browser:

    page.wait_for_timeout(5000)  # Wait to observe the result

    # Close the browser session
    browser.close()

Apart from this method, the Playwright Stealth plugin also helps bypass CAPTCHAs by making automated interactions appear more like human behavior. This open-source plugin enhances Playwright with various evasion techniques that help to avoid detection by CAPTCHA systems. For instance, it can modify the User Agent to mimic a real browser, spoof runtime environments, disable WebRTC to prevent IP address identification, and alter the WebDriver navigator field to avoid typical scraping patterns.

Fixing common issues

When automating CAPTCHA solving with Playwright and third-party services, you may encounter some common issues. Here’s how to resolve them:

CAPTCHA not detected or incorrect selector

Issue: Playwright fails to locate the CAPTCHA element due to an incorrect selector.

Solution: Double-check the CSS selector for the CAPTCHA. Use more flexible selectors like targeting the src attribute of CAPTCHA images:

captcha_element = page.locator("img[src*='captcha']")
captcha_element.screenshot(path="captcha.png")

You can also use Playwright's built-in debugging tools to inspect the page and find the correct selector. Use page.pause() to pause the script and inspect the elements in the browser.

CAPTCHA solving service response delay

Issue: Delays in receiving the CAPTCHA solution from the service.

Solution: Implement a retry mechanism with time.sleep() to wait before retrying the request for a solution:

import time
time.sleep(5)  # Wait for 5 seconds before retrying

Invalid API key

Issue: Using an incorrect or expired API key will cause request failures.

Solution: Ensure the API key is valid and correctly included in your requests. Check the service's dashboard for the correct key:

api_key = "your_valid_api_key"
response = requests.post(
    "https://2captcha.com/in.php",
    data={"key": api_key, "method": "post"},
    files={"file": captcha_image},
)

Wrapping up: bypass CAPTCHA with Playwright

Bypassing CAPTCHA with Playwright combines automation with third-party CAPTCHA-solving services for efficient handling of challenges. To enhance reliability and handle IP-based restrictions, you can integrate a proxy service like Webshare. This allows you to rotate IPs seamlessly and avoid detection during automation.

Proxy in Playwright: 3 Effective Setup Methods Explained

Puppeteer vs. Playwright