Search code examples
rubyweb-scrapingnokogiriscreen-scrapingwatir-webdriver

How do you scrape a webpage to check if you need to solve for captcha


I'm using 'watir', 'curb', 'nokogiri', 'esay_captcha_solver' and I'm trying to scrape the page so I'll know if a captcha has appeared and then solve it by getting the image url. However I'm not sure what to put in the if statement and how to scrape what I need.

    #=> SIGN IN
    browser = Watir::Browser.new :ff
    browser.goto "https://soundcloud.com/login"
    browser.text_field(:id => "site-username").set "#{name}"
    browser.text_field(:id => "site-password").set "#{pass}"
    browser.button(:id => "log-in-submit-button").click
    if browser.body(:url => "https://soundcloud.com/login?captcha=true").text.include? (:id => "recaptcha_table")
        http = Curl.get("https://soundcloud.com/login?captcha=true") do |http|
        http.headers['User-Agent'] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10; rv:33.0) Gecko/20100101 Firefox/33.0"

This if statement doesn't work because it doesn't seem to be able to read the text...in the browser it just stops when there's a captcha.

      end
      puts http.form_str
      easy_c = EasyCaptchaSolver.new(image_url: "...")
      easy_c.captcha

I want to be able to scrape the image url, I'm not sure how to get nogokiri to recognize the html code to scrape and then input the image url..

    else
      browser.goto "http://soundcloud.com/you/sets"
    end

The captcha html looks like:

captch code


Solution

  • 1st line - checking captcha exists or aren't

    2nd - get an url of captcha

    if browser.element(:id => 'recaptcha_image').exists?
        img_url = browser.image(:id => 'recaptcha_challenge_image').src
        easy_c = EasyCaptchaSolver.new(image_url: "#{img_url}")
        easy_c.captcha
    end