I'm facing trouble when I try to create missing values in a Frame and later perform operations with them. Here is a "working" sample:
open Deedle
open System.Text.RegularExpressions
do fsi.AddPrinter(fun (printer:Deedle.Internal.IFsiFormattable) -> "\n" + (printer.Format()))
module Frame = let mapAddCol col f frame = frame |> Frame.addCol col (Frame.mapRowValues f frame)
[ {|Desc = "A - 1.50ml"; ``Price ($)`` = 23.|}
{|Desc = "B - 2ml"; ``Price ($)`` = 18.5|}
{|Desc = "C"; ``Price ($)`` = 25.|} ]
|> Frame.ofRecords
(*
Desc Price ($)
0 -> A - 1.50ml 23
1 -> B - 2ml 18.5
2 -> C 25
*)
|> Frame.mapAddCol "Volume (ml)" (fun row ->
match Regex.Match(row.GetAs<string>("Desc"),"[\d\.]+").Value with
| "" -> OptionalValue.Missing
| n -> n |> float |> OptionalValue)
(*
Desc Price ($) Volume (ml)
0 -> A - 1.50ml 23 1.5
1 -> B - 2ml 18.5 2
2 -> C 25 <missing>
*)
|> fun df -> df?``Price ($/ml)`` <- df?``Price ($)`` / df?``Volume (ml)``
//error message: System.InvalidCastException: Object must implement IConvertible.
What is wrong with this approach?
Deedle internally stores a flag whether a value is present or missing. This is typically exposed via the OptionalValue
type, but the internal representation is not actually using this type.
When you use a function such as mapRowValues
to generate new data, Deedle needs to recognize which data is missing. This happens in only somewhat limited cases only. When you return OptionalValue<float>
, Deedle actually produces a series where the type of values is OptionalValue<float>
rather than float
(the type system does not let it do anything else).
For float
values, the solution is just to return nan
as your missing value:
|> Frame.mapAddCol "Volume (ml)" (fun row ->
match Regex.Match(row.GetAs<string>("Desc"),"[\d\.]+").Value with
| "" -> nan
| n -> n |> float )
This will create a new series of float
values, which you can then access using the ?
operator.