Suppose I have a two step process. First data collection/cleaning and second some operation.
For example:
#r "nuget: Deedle"
open Deedle
type Person =
{ Name:string; Birthday:DateTime}
let fixB b =
if b > DateTime(2023,01,01) then OptionalValue.Missing else OptionalValue b
let peopleRecds = [ { Name = "Joe"; Birthday = DateTime(9999,12,31) }
{ Name = "Jim"; Birthday = DateTime(2000,12,31) }]
let df = Frame.ofRecords peopleRecds
let step1 = df.Clone()
step1.ReplaceColumn("Birthday", df |> Frame.mapRowValues (fun row -> fixB (row.GetAs<DateTime>"Birthday")))
step1.SaveCsv(__SOURCE_DIRECTORY__ + "step1.csv")
let step1' = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "step1.csv")
step1.Print()
Name Birthday
0 -> Joe <missing>
1 -> Jim 12/31/2000 12:00:00 AM
If I save it (step1'
) or not (step1
), I would like to continue without having to deal with different cases in step2
.
let payout b =
match b with
| OptionalValue.Present c -> if c > DateTime(2000,01,01) then 100 else 0
| OptionalValue.Missing -> 0
let step2 = step1.Clone()
step2.AddColumn("Payout", step1 |> Frame.mapRowValues (fun row -> payout (row.TryGetAs<DateTime>"Birthday")))
Error: System.InvalidCastException: Object must implement IConvertible.
The first issue is that the way you use mapRowValues
introduces optional values into the data frame (this is something that is often automatically eliminated, but not in this case it seems). OptionValue<'T>
does not implement IConvertible
, so this later causes issues. You can solve this by calculating birthday as follows:
let fixB b =
if b > DateTime(2023,01,01) then None else Some b
let bday =
df.Columns.["Birthda y"].As<DateTime>()
|> Series.mapAll (fun _ v -> Option.bind fixB v)
step1.ReplaceColumn("Birthday", bday)
The second issue with saving and loading data frame is that the CSV parser does not seem to automatically figure out that Birthday
is DateTime
. You can solve this by adding an explicit schema (and you can also disable saving of keys to make sure the frame you load is exactly the same as the one you save):
step1.SaveCsv(__SOURCE_DIRECTORY__ + "step1.csv",includeRowKeys=false)
let step1' = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "step1.csv", schema="string,date")