Search code examples
rubycsvrubyzip

How to read specific columns of a zipped CSV file


I used the code below to read the contents of a zipped CSV file.

Zip::ZipFile.foreach(file) do |entry|
  istream = entry.get_input_stream
  data = istream.read
  #...
end

It gives me the entire content of the text (CSV) file with headers like below:

NAME AGE GENDER NAME1 29 MALE NAME2 30 FEMALE

but I need specific data of the column. For example, I want to display only the names (NAME). Please help me proceed with this.


Solution

  • Though your example shows ZipFile, you're really asking a CSV question. First, you should check the docs in http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html

    You'll find that if you parse your data with the :headers => true option, you'll get a CSV::table object that knows how to extract a column of data as follows. (For obvious reasons, I wouldn't code it this way -- this is for example only.)

    require 'zip'
    require 'csv'
    
    csv_table = nil
    Zip::ZipFile.foreach("x.csv.zip") do |entry|
      istream = entry.get_input_stream
      data = istream.read
      csv_table = CSV.parse(data, :col_sep => " ", :headers => true)
    end
    

    With the data you gave, we need `col_sep => " " since you're using spaces as column separators. But now we can do:

    >> csv_table["NAME"]   # extract the NAME column
    => ["NAME1", "NAME2"]