Search code examples
pythonseleniumweb-scrapingselenium-chromedriver

Bypassing the reCAPTCHA and login using selenium


I'm working on a web-crawling project and want to scrape some news articles from the website https://www.utusan.com.my/

While logging in the website there is a reCAPTCHA which I wanna bypass, even if clicked the CAPTCHA using selenium it taking me to the picture test to confirm identity. I was wondering if there is any way to bypass that and login. I even tried with Chrome Undetected Driver but didn't worked out. Please help me with this.

Here is my code with which I tried,

   import undetected_chromedriver as uc 
   options = webdriver.ChromeOptions()
    
    lists = ['disable-popup-blocking']

    caps = DesiredCapabilities().CHROME
    caps["pageLoadStrategy"] = "normal"

    options.add_argument("--window-size=1920,1080")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-notifications")
    options.add_argument("--disable-Advertisement")
    options.add_argument("--disable-popup-blocking")
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])

    driver = uc.Chrome(use_subprocess=True)
    
    username = <username>
    password = <password>

    driver.get('https://www.utusan.com.my/akaun/')
    
    driver.find_element(By.ID, 'username').send_keys(username)
    # input user name
    driver.find_element(By.ID, 'password').send_keys(password)  
    # input password
    
    WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']")))
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//span[@id='recaptcha-anchor']"))).click()
    driver.find_element(By.CLASS_NAME, 'woocommerce-button button woocommerce-form-login__submit').click()
    

Solution

  • This might help. It uses the Chrome Devtools Protocol to connect (via Websocket) to a remote Chrome instance running on their servers, which handles proxies, CAPTCHA solving etc. automatically for you with no extra app code. I've only used it with Playwright but I know it can be easily integrated within your existing Selenium script too (you can find the instructions here).