I am a bit confused about Cache
and CacheRows
.
It seems MyCsvType.Load(path).Take(30000).Cache()
doesn't actually read the 30000 rows immediately. (unlike Seq.cache
)
Then, why do we need Cache
given we have already CacheRows
Additionally, if I am only interested in the first 30000 rows, should I use MyCsvType.Load(path).Take(30000)
or MyCsvType.Load(path).Rows |> Seq.take 30000
If you look at F# Data source code, you can see that Cache
, Take
and other operators are just calling the corresponding Seq.xyz
operations under the cover (this is in CsvRuntime.fs).
The key difference is that when you create a type provider without specifying CacheRows=false
, it will actually call Cache
by default. So, the trick is to create a type provider using CacheRows=false
and then you can use Seq.cache
or the Cache
method (and other operations) interchangeably.
let stocks = CsvProvider<"sample.csv", CacheRows=false>.GetSample()
stocks.Take(10).Cache() // Using methods is now exactly
stocks |> Seq.take 10 |> Seq.cache // the same as using functions