I am trying to store the results from my scrapping exercice into a CSV file.
The current CSV file gives me the following output :
Name of Movie 1
Rating 1
Name of Movie 2
Rating 2
I would like to get the following output :
Name of Movie 1 Rating 1
Name of Movie 2 Rating 2
Here is my code, I guess it has to deal with the row / column separator :
require 'open-uri'
require 'nokogiri'
require 'csv'
array = []
for i in 1..10
url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
html_doc.search('.img_side_content').each do |element|
array << element.search('.no_underline').inner_text
element.search('.note').each do |data|
array << data.inner_text
puts array
csv_options = { row_sep: ',', force_quotes: true, quote_char: '"' }
filepath = 'allocine.csv'
CSV.open(filepath, 'wb', csv_options) do |csv|
array.each { |item| csv << [item] }
I think the problem here is that you are not pushing the elements correctly into your array
variable. Basically, your array ends up looking like this:
['Movie 1 Title', 'Movie 1 rating', 'Movie 2 Title', 'Movie 2 rating', ...]
What you actually want is an array of arrays, like so:
['Movie 1 Title', 'Movie 1 rating'],
['Movie 2 Title', 'Movie 2 rating'],
And once your array is correctly set, you don't even need to specify a row separator in your CSV options.
The following should do the trick:
require 'open-uri'
require 'nokogiri'
require 'csv'
array = []
10.times do |i|
url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
html_doc.search('.img_side_content').each do |element|
title = element.search('.no_underline').inner_text.strip
notes = element.search('.note').map { |note| note.inner_text }
array << [title, notes].flatten
puts array
filepath = 'allocine.csv'
csv_options = { force_quotes: true, quote_char: '"' }
CSV.open(filepath, 'w', csv_options) do |csv|
array.each do |item|
csv << item
( I also took the liberty of changing your for
loop to a times
, which is more ruby-like ;) )