Search code examples
f#type-providers

What's the use of cache in csv type provider?


I am a bit confused about Cache and CacheRows.

It seems MyCsvType.Load(path).Take(30000).Cache() doesn't actually read the 30000 rows immediately. (unlike Seq.cache)

Then, why do we need Cache given we have already CacheRows

Additionally, if I am only interested in the first 30000 rows, should I use MyCsvType.Load(path).Take(30000) or MyCsvType.Load(path).Rows |> Seq.take 30000


Solution

  • If you look at F# Data source code, you can see that Cache, Take and other operators are just calling the corresponding Seq.xyz operations under the cover (this is in CsvRuntime.fs).

    The key difference is that when you create a type provider without specifying CacheRows=false, it will actually call Cache by default. So, the trick is to create a type provider using CacheRows=false and then you can use Seq.cache or the Cache method (and other operations) interchangeably.

    let stocks = CsvProvider<"sample.csv", CacheRows=false>.GetSample()
    stocks.Take(10).Cache()             // Using methods is now exactly
    stocks |> Seq.take 10 |> Seq.cache  // the same as using functions