Search code examples
rubycsvexport-to-excelexport-to-csv

Storing in CSV file - ruby separator


I am trying to store the results from my scrapping exercice into a CSV file.

The current CSV file gives me the following output :

Name of Movie 1

Rating 1

Name of Movie 2 

Rating 2     

I would like to get the following output :

Name of Movie 1 Rating 1 

Name of Movie 2 Rating 2 

Here is my code, I guess it has to deal with the row / column separator :

require 'open-uri'
require 'nokogiri'
require 'csv'

array = []


for i in 1..10
  url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
  html_file = open(url).read
  html_doc = Nokogiri::HTML(html_file)


  html_doc.search('.img_side_content').each do |element|
    array << element.search('.no_underline').inner_text
    element.search('.note').each do |data|
      array << data.inner_text
    end
  end
end

puts array


csv_options = { row_sep: ',', force_quotes: true, quote_char: '"' }
filepath    = 'allocine.csv'

CSV.open(filepath, 'wb', csv_options) do |csv|
  array.each { |item| csv << [item] }
end

Solution

  • I think the problem here is that you are not pushing the elements correctly into your array variable. Basically, your array ends up looking like this:

    ['Movie 1 Title', 'Movie 1 rating', 'Movie 2 Title', 'Movie 2 rating', ...]
    

    What you actually want is an array of arrays, like so:

    [
      ['Movie 1 Title', 'Movie 1 rating'],
      ['Movie 2 Title', 'Movie 2 rating'],
      ...
    ]
    

    And once your array is correctly set, you don't even need to specify a row separator in your CSV options.

    The following should do the trick:

    require 'open-uri'
    require 'nokogiri'
    require 'csv'
    
    array = []
    
    
    10.times do |i|
      url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
      html_file = open(url).read
      html_doc = Nokogiri::HTML(html_file)
    
    
      html_doc.search('.img_side_content').each do |element|
        title = element.search('.no_underline').inner_text.strip
        notes = element.search('.note').map { |note| note.inner_text }
        array << [title, notes].flatten
      end
    end
    
    puts array
    
    filepath    = 'allocine.csv'
    csv_options = { force_quotes: true, quote_char: '"' }
    
    CSV.open(filepath, 'w', csv_options) do |csv|
      array.each do |item|
        csv << item
      end
    end
    

    ( I also took the liberty of changing your for loop to a times, which is more ruby-like ;) )