I have the following function that convert csv files to a specific txt schema (expected by CNTKTextFormat Reader):
open System.IO
open FSharp.Data;
open Deedle;
let convert (inFileName : string) =
let data = Frame.ReadCsv(inFileName)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
use outFile = new StreamWriter(outFileName, false)
data.Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map(fun (k,v) -> v) |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
outFile.WriteLine(sprintf "|labels %A |features %s" label body)
printf "%A" label
| _ ->
failwith "Bad data."
)
|> ignore
Strangely, the output file is empty after running in the F# interactive panel and that printf
yields no printing at all.
If I remove the ignore
to make sure that there are actual rows being processed (evidenced by returning a seq of nulls), instead of an empty file I get:
val it : seq<unit> = Error: Cannot write to a closed TextWriter.
Before, I was declaring the StreamWriter
using let
and disposing it manually, but I also generated empty files or just a few lines (say 5 out of thousands).
What is happening here? Also, how to fix the file writing?
Seq.map
returns a lazy sequence which is not evaluated until it is iterated over. You are not currently iterating over it within convert
so no rows are processed. If you return a Seq<unit>
and iterate over it outside convert
, outFile
will already be closed which is why you see the exception.
You should use Seq.iter
instead:
data.Rows.Observations
|> Seq.iter (fun kvp -> ...)