I have created a data frame using
// Create a dataframe containing the Open, High, Low, and Close
let ohlc =
cl
|> Frame.sliceCols ["Open"; "High"; "Low"; "Close"; "Volume"]
with the resulting output:
Open High Low Close Volume
12/28/2014 8:00:00 PM -> 62.13 62.67 62.13 62.27 3206
12/28/2014 9:00:00 PM -> 62.27 62.42 62.14 62.39 1620
12/28/2014 10:00:00 PM -> 62.4 62.41 62.16 62.21 1275
12/28/2014 11:00:00 PM -> 62.21 62.32 61.96 62.19 2791
12/29/2014 12:00:00 AM -> 62.17 62.25 62.08 62.23 1233
12/29/2014 1:00:00 AM -> 62.23 62.41 62.21 62.31 1186
12/29/2014 2:00:00 AM -> 62.32 62.32 62.07 62.21 1446
12/29/2014 3:00:00 AM -> 62.22 62.35 62.17 62.28 1335
I now want to generate a higher time frame (daily) from the above hourly sample.
I start off with:
ohlc
|> Frame.rows
|> Series.resampleEquiv (fun d -> d.Date)
which returns:
Series<DateTime,Series<DateTime,ObjectSeries<string>>>.
I want to create a new DataFrame containing the columns Date(key), Open, High, Low, Close and Volume. Open is the 1st open in row 1 of the series. High is the Max High in the series. Low is the Min Low in the series. Close is the last Close in the series. Volume is the sum of Volume in the series
So something like:
ohlc
|> Frame.rows
|> Series.resampleEquiv (fun d -> d.Date)
|> ??
|> ??
Instead of trying to do this at the Frame level using Rows, would I be better off trying do this with Frame using Columns?
UPDATE Here is the finished code:
ohlc
|> Frame.rows
|> Series.resampleEquiv (fun d -> d.Date)
|> Series.mapValues (fun s ->
let temp = Frame.ofRows s
series ["Open" => Series.firstValue temp?Open
"High" => defaultArg (Stats.max temp?High) nan
"Low" => defaultArg (Stats.min temp?Low) nan
"Close" => Series.lastValue temp?Close
"Volume" => defaultArg (Some( Stats.sum temp?Volume) ) nan ] )
|> Frame.ofRows
I was not able to use:
"Volume" => defaultArg (Stats.sum temp?Volume) nan ] )
as this gave me an error message: This expression was expected to have type float option but here has type float. I had to wrap the function Some(). Not sure why Stats.sum requires this but Stat.max and Stats.min do not.
After calling resampleEquiv
, you end up with a series (representing the chunks with the same date) of series (representing values with different times but the same date) of object series (representing the different columns of the original frame).
You can iterate over the top-level series and turn each of the series of object series (each chunk) back into a frame. Then you can do the aggregations over the frame and return a new row:
source
|> Series.resampleEquiv (fun d -> d.Date.Year)
|> Series.mapValues (fun s ->
let temp = Frame.ofRows s
series [ "Open" => Series.firstValue temp?Open
"High" => defaultArg (Stats.max temp?High) nan ])
|> Frame.ofRows
I did it just for Open and High, but you can see the idea :-). Calling of the Frame.ofRows
on each of the chunks should also be reasonably fast, because Deedle knows that all items in the chunk have the same index. (Alternatively, you could iterate over the individual rows, but that would make it longer).