Search code examples
csvstreamelixir

Elixir: Trying to Write a Map to CSV, Being Written as Stream Results


I've been scratching my head on this one for a while. I am trying to write a program which outputs the frequency of every word in a given text file to a .csv file. I have succeeded in creating functions that find the frequency of each word and output the result of this as a map, but my tocsv function writes the results as Stream results for some reason and I cannot figure out why, or how to avoid this. Here is my code:

defmodule WordFrequency do

  def wordCount(readFile) do
     readFile
     |> words
     |> count
     |> tocsv
  end

  defp words(file) do
    file
    |> File.stream!
    |> Stream.map(&String.trim_trailing(&1))
    |> Stream.map(&String.split(&1,~r{[^A-Za-z0-9_]}))
    |> Enum.to_list
    |> List.flatten

  end

  defp count(words) when is_list(words) do
    Enum.reduce(words, %{}, &update_count/2)
  end

  defp update_count(word, acc) do
    Map.update acc, String.to_atom(word), 1, &(&1 + 1)
  end

  defp tocsv(map) do
    file = File.open!("test.csv", [:write, :utf8])
    map
    |> IO.inspect
    |> Enum.map(&CSV.encode(&1))
    |> Enum.each(&IO.inspect(file, &1, []))
  end

end

The results of count (it's a test file) are:

bitterness: 1, fan: 1, respiration: 1, radiator: 1, ceiling: 1, run: 1,
  duck: 1, roundess: 1, terrorism: 1, she: 1, over: 1, equipment: 2, test: 1,
  freshness: 1, feminism: 1, bucket: 1, goodness: 1, manliness: 1,
  reflection: 1, uncomfortable: 1, tourism: 1, house: 1, ableism: 1, stairs: 1,
  heroism: 1, sadness: 1, socialism: 1, fruit: 1, dogs: 1, mechanism: 1,
  symbolism: 1, predilection: 1, up: 1, sedition: 1, faithfulness: 1,
  fruition: 1, criticism: 1, conformation: 1, extradition: 1, braveness: 1,
  ionization: 1, indigestion: 1, bubble: 1, introspection: 1, liquid: 1,
  apartment: 1, deep: 1, department: 1, centralization: 1, bitter: 1, ...}

So I know that I'm not passing a stream to my tocsv function, but something happens in tocsv that converts it to a stream and doesn't convert it to a writeable format before outputting to the csv file. Anyone have any idea how I can make a workaround to this? I am using this CSV module: https://github.com/beatrichartz/csv

Thanks!


Solution

  • There is an example of producing the CSV in the README of the CSV module you use:

    file = File.open!("test.csv", [:write, :utf8])
    table_data |> CSV.encode |> Enum.each(&IO.write(file, &1))
    

    Please note, that IO.write/2 writes the bytes to the device, while IO.inspect/3 inspects the second argument according to the given options using the IO device. Also, CSV.encode/1 expects a two-dimensional list.

    That said, you probably should stick with IO.write/2 as mentioned in the example, and produce a 2d list in count rather that a Map:

    defp count(words) when is_list(words) do
      words
      |> Enum.reduce(%{}, &update_count/2)
      |> Enum.reduce([], fn {k, v}, acc -> [[k, v] | acc] end)
    end
    
    defp tocsv(map) do
      file = File.open!("test.csv", [:write, :utf8])
    
      map
      |> IO.inspect
      |> CSV.encode
      |> Enum.each(&IO.write(file, &1))
    end
    

    In such an simple case, I would just use bare Elixir to produce a file, though (assuming count returns a map, as in your original code):

    defp tocsv(map) do
      File.open("test.csv", [:write, :utf8], fn(file) ->
        Enum.each(map, &IO.write(file, Enum.join(Tuple.to_list(&1), ?,) <> "\n"))
      end)
    end
    

    Or, even simpler:

    defp tocsv(map) do
      File.write!("test.csv", 
         map
         |> Enum.map(Enum.join(&Tuple.to_list(&1), ?,))
         |> Enum.join("\n"))
    end