Search code examples
rubyxmlweb-scrapingnokogirimechanize

Scraping a webpage with Mechanize and Nokogiri and storing data in XML doc


I am trying to scrape a website and store data in XML using Mechanize and Nokogiri. I didn't set up a Rails project and I am only using Ruby and IRB.

I wrote this method:

def mechanize_club
    agent = Mechanize.new
    agent.get("http://www.rechercheclub.applipub-fft.fr/rechercheclub/")
    form = agent.page.forms.first
    form.field_with(:name => 'codeLigue').options[0].select
    form.submit
    page2 = agent.get('http://www.rechercheclub.applipub-fft.fr/rechercheclub/club.do?codeClub=01670001&millesime=2015')
    body = page2.body
    html_body = Nokogiri::HTML(body)
    codeclub = html_body.search('.form').children("tr:first").children("th:first").to_i
    @codeclubs << codeclub
    filepath  = '/davidgeismar/Documents/codeclubs.xml'
    builder   = Nokogiri::XML::Builder.new(encoding: 'UTF-8') do |xml|
       xml.root {
          xml.codeclubs {
            @codeclubss.each do |c|
              xml.codeclub {
                xml.code_   c.code
              }
            end
          }
        }
    end
    puts builder.to_xml
  end

My first problem is that I don't know how to test my code. I call ruby webscraper.rb in my console, the file is treated I think, but it doesn't create an XML file in the specified path. Then, more specifically I am quite sure this code is wrong as I didn't get a chance to test it.

Basically what I am trying to do is to submit a form several times:

 agent = Mechanize.new
      agent.get("http://www.rechercheclub.applipub-fft.fr/rechercheclub/")
      form = agent.page.forms.first
      form.field_with(:name => 'codeLigue').options[0].select
      form.submit

I think this code is ok, but I dont want it to only select options[0], I want it to select an option, then scrape all the data I need, then go back to page, then select options[1]... until there are no more options (an iteration I guess).


Solution

  • the file is treated I think, but it doesnt create an xml file in the specified path.

    There is nothing in your code that creates a file. You print some output, but don't do anything to open or write a file.

    Perhaps you should read the IO and File documentation and review how you are using your filepath variable?

    The second problem is that you don't call your method anywhere. Though it's defined and Ruby will see it and parse the method, it has no idea what you want to do with it unless you invoke the method:

    def mechanize_club
      ...
    end
    
    mechanize_club()