Search code examples
f#sequencetype-providers

Why can't I go twice through the rows of a CSV provider?


In some languages after one goes through a lazy sequence it becomes exhausted. That is not the case with F#:

let mySeq = seq [1..5]

mySeq |> Seq.iter (fun x -> printfn "%A" <| x)
mySeq |> Seq.iter (fun x -> printfn "%A" <| x)

1
2
3
4
5
1
2
3
4
5

However, it looks like one can go only once through the rows of a CSV provider:

open FSharp.Data

[<Literal>]
let foldr = __SOURCE_DIRECTORY__ + @"\data\"

[<Literal>]
let csvPath = foldr + @"AssetInfoFS.csv"

type AssetsInfo = CsvProvider<Sample=csvPath,
                              HasHeaders=true,
                              ResolutionFolder=csvPath,
                              AssumeMissingValues=false,
                              CacheRows=false>

let assetInfo = AssetsInfo.Load(csvPath)
assetInfo.Rows |> Seq.iter (fun x -> printfn "%A" <| x) // Works fine 1st time
assetInfo.Rows |> Seq.iter (fun x -> printfn "%A" <| x) // 2nd time exception

Why does that happen?


Solution

  • From this link on the CSV Parser, the CSV Type Provider is built on top of the CSV Parser. The CSV Parser works in streaming mode, most likely by calling a method like File.ReadLines, which will throw an exception if the enumerator is enumerated a second time. The CSV Parser also has a Cache method. Try setting CacheRows=true (or leaving it out of the declaration since its default value is true) to avoid this issue

    CsvProvider<Sample=csvPath,
                HasHeaders=true,
                ResolutionFolder=csvPath,
                AssumeMissingValues=false,
                CacheRows=true>