Search code examples
indexingf#dataframedeedle

Deedle frame indexRowsDate no longer sorting the rows?


I was trying to use the Deedle (downloaded from github 20150407) to test some windowInto function on data frame. However I noticed the following behaviour:

#I "../../bin/"
#r "Deedle.dll"

open System
open System.Data
open System.Dynamic
open System.Collections.Generic
open Deedle


let df1 = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/data/MSFT.csv", inferRows=10)  
           |> Frame.take 5 |> Frame.indexRowsDate "Date" 
df1.Print();
let df2 = df1   |> Frame.sortRowsByKey
df2.Print(); 

                          Open  High  Low   Close Volume   Adj Close 
27/01/2012 12:00:00 AM -> 29.45 29.53 29.17 29.23 44187700 29.23     
26/01/2012 12:00:00 AM -> 29.61 29.70 29.40 29.50 49102800 29.50     
25/01/2012 12:00:00 AM -> 29.07 29.65 29.07 29.56 59231700 29.56     
24/01/2012 12:00:00 AM -> 29.47 29.57 29.18 29.34 51703300 29.34     
23/01/2012 12:00:00 AM -> 29.55 29.95 29.35 29.73 76078100 29.73     
                          Open  High  Low   Close Volume   Adj Close 
23/01/2012 12:00:00 AM -> 29.55 29.95 29.35 29.73 76078100 29.73     
24/01/2012 12:00:00 AM -> 29.47 29.57 29.18 29.34 51703300 29.34     
25/01/2012 12:00:00 AM -> 29.07 29.65 29.07 29.56 59231700 29.56     
26/01/2012 12:00:00 AM -> 29.61 29.70 29.40 29.50 49102800 29.50     
27/01/2012 12:00:00 AM -> 29.45 29.53 29.17 29.23 44187700 29.23     
val df1 : Frame<DateTime,string>
val df2 : Frame<DateTime,string>
val it : unit = ()

The dataFrame after indexRowDate is no longer having its rows sorted in ascending order. This will cause any index based operation like windowInto to fail.

In order to make it to work, the data frame needs to be sorted.

Is this a new behaviour or a bug?

Thanks


Solution

  • This is by design:

    • The ReadCsv function reads the data in the order in which they appear in the CSV file (for Yahoo stock prices, this has the most recent price at the top)

    • The indexRowsDate function does not change the order - it just replaces the key with values from the specified column.

    As far as I know, the snippet you posted always behaved this way (but I may be missing something?) If you want to create ordered frame, you need to call sortRowsByKeys (as you did) or if you're reading data from Yahoo, you can probably just use Frame.rev.