Search code examples
f#deedle

Deedle OptionalValue.Missing can't be dropped by Series.dropmissing


this is the code example from http://bluemountaincapital.github.io/Deedle/reference/deedle-seriesmodule.html

let s1 = series [ 1 => 1.0; 2 => Double.NaN ]
s1 |> Series.dropMissing 

the Missing value will be dropped as expected. however if I change this to

let s2 = series [ 1 => OptionalValue(1.0); 2 => OptionalValue.Missing ]
s2 |> Series.dropMissing 

the missing value won't be dropped.

I noticed S2 is a

Series<int,OptionalValue<float>> 

type while s1 is

Series<int,float>

Is this behaviour by design?

The Reason I asked this question is I have this code from this answer Deedle moving window stats calcuation with a dynamic condition and boundary.atending

let lastKey = ref None
let r = 
  ts |> Series.aggregateInto
      (WindowWhile(fun d1 d2 -> d1.AddMonths(1) >= d2)) (fun seg -> seg.Data.LastKey())
      (fun ds -> 
         match lastKey.Value, ds.Data.LastKey() with 
         | Some lk, clk when lk = clk -> OptionalValue.Missing
         | _, clk -> lastKey := Some clk; OptionalValue(ds.Data))
     |> Series.dropMissing

The Series.aggregateInto somehow can return a non OptionalValue Series while still contain missing value. if I want to use OptionalValue.Missing in a series I create so they can be properly ignored by Stats.mean what is the right way to do it?

Also writing a Series/Frame to csv with missing value Deedle will put blank in the output. however if Series/Frame contains OptionalValue Deedle will put the string in the output. is this by design?


Solution

  • The OptionalValue type in Deedle is the internal representation of optional values inside a series. So, if you have Series<Date, float>, it will actually store data as OptionalValue<float>. We do not completely hide this from users - sometimes (like in the aggregation sample), Deedle will take OptionalValue and then use it directly in the internal representation to make things faster.

    However, you probably never want to use Series<K, OptionalValue<T>>, because that is an odd kind of series (Deedle handles missing values automatically, so there is no need for this).

    If you want to specify missing values when creating a series, you can use:

    let s2 = Series.ofOptionalObservations [ 1 => Some(1.0); 2 => None ]
    s2 |> Series.dropMissing 
    

    The F# API generally prefers standard F# option type, so this is what ofOptionalObservations takes. The dropMissing function works fine on series created in this way.