Search code examples
rubynokogirimechanize-ruby

How can I get Mechanize objects from Mechanize::Page's search method?


I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search returns Nokogiri::XML::Elements which I can't use to fill and submit forms etc.

I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with methods too. However, matching things like :not(.class) is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.

Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search method?


Solution

  • Like stated in this answer you can simply construct a new Mechanize::Form object using your Nokogiri::XML::Element retrieved via Mechanize::Page#search or Mechanize::Page#at:

    a = Mechanize.new
    page = a.get 'https://stackoverflow.com/'
    
    # Get the search form via ID as a Nokogiri::XML::Element
    form = page.at '#search'
    
    # Convert it back to a Mechanize::Form object
    form = Mechanize::Form.new form, a, page
    
    # Use it!
    form.q = 'Foobar'
    result = form.submit
    

    Note: You have to provide the Mechanize object and the Mechanize::Page object to the constructor to be able to submit the form. Otherwise it would just be a Mechanize::Form object without context.


    There seems to be no central utility function to convert Nokogiri::XML::Elements to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.