Search code examples
dataframef#deedle

Why is Deedle casting a DataFrame boolean column into a float Series?


When I run the code below I get a DataFrame with one bool column and two double columns. However, when I extract the boolcolumn as a Series the result is a Series object with types DateTime and float.

It looks like Deedle "cast" the column to another type.

Why is this happening?

open Deedle
let dates  = 
      [ DateTime(2013,1,1); 
        DateTime(2013,1,4); 
        DateTime(2013,1,8) ]

let values = [ 10.0; 20.0; 30.0 ]
let values2 = [ 0.0; -1.0; 1.0 ]


let first = Series(dates, values)
let second = Series(dates, values2)
let third: Series<DateTime,bool> = Series.map (fun k v -> v > 0.0) second

let df1 = Frame(["first"; "second"; "third"], [first; second; third])

let sb = df1.["third"]

df1;;
val it : Frame<DateTime,string> =
  Deedle.Frame`2[System.DateTime,System.String]
    {ColumnCount = 3;
     ColumnIndex = Deedle.Indices.Linear.LinearIndex`1[System.String];
     ColumnKeys = seq ["first"; "second"; "third"];
     ColumnTypes = seq [System.Double; System.Double; System.Boolean];
     ...

sb;;
val it : Series<DateTime,float> = ...

Solution

  • As the existing answer points out, GetColumn is the way to go. You can specify the generic parameter directly when calling GetColumn and avoid the type annotation to make the code nicer:

    let sb = df1.GetColumn<bool>("third")
    

    Deedle frame does not statically keep track of the types of the columns, so when you want to get a column as a typed series, you need to specify the type in some way.

    We did not want to force people to write type annotations, because they tend to be quite long and ugly, so the primary way of getting a column is GetColumn where you can specify the type argument as in the above example.

    The other ways of accessing column such as df?third and df.["third"] are shorthands that assume the column type to be float because that happens to be quite common scenario (at least for the most common uses of Deedle in finance), so these two notations give you a simpler way that "often works nicely".