Can someone please help in translating the following F# pseudo-code into working code using Deedle?
open Deedle
..
//df = Frame<DateTime, string>(...), int values, keys ordered
let mutable leftKey = leftmostKey
seq {
for row in df.Rows do
let columnKeyMax = row[leftKey..] |> findColumnKeyContainingMaxValue
if columnKeyMax <> leftKey then
leftKey <- columnKeyMax
yield (row.Key, columnKeyMax)
}
Given the following dataframe, I am looking to get the coordinates of the marked cells:
["1/11/2010 22:00", "GCA2010Dec"; "29/11/2010 23:00", "GCA2011Feb";...]
let mutable currentColumnKey, currentMax = find max in first row //"GCA2010Dec, 113517"
for each row after first
newColumnKey, newMax = try find the first column on the right of currentColumnKey with value > row[currentColumnKey]
if found then yield (row.Key, newColumnKey)
currentColumnKey <- newColumnKey
currentMax <- row[newColumnKey]
columns keys are future contracts row keys are trading sessions, content is daily volume. The algorithm aims to find roll pairs (date, contract). At the beginning there is no current contract so the maximum volume in the first row is taken => first roll is ("1/11/2010 22:00", "GCA2010Dec"). Then rows are parsed until a contract is found right of the current and with greater volume. 29/11/2010 is the first date where on the right of current contract (GCA2010Dec) there is another contract (GCA2011Feb) with a greater volume (110942 > 27451). This generates a new roll pair ("29/11/2010 23:00", "GCA2011Feb"). So on until all rows are parsed
I think there is not much clever you can do here with Deedle. It ends up being an iteration over the data in both dimensions (and it may be easier to do this using a plain 2D array). However, here is a Deedle-based solution.
First, I loaded the data, indexed it by keys and parsed into int
so that Deedle handles <missing>
correctly for me:
let df =
Frame.ReadCsv(@"C:\Users\tomas\Downloads\df.csv")
|> Frame.indexRowsDate "Key"
|> Frame.mapValues (fun (o:string) ->
if String.IsNullOrEmpty o then Nullable() else Nullable(Int32.Parse o ))
The key function is then a recursive one (to implement iteration over rows). One clever thing you can do is to notice that the column keys are also ordered and so you can easily get a sub-frame with only columns on the right using subdf.Columns.[ck ..]
:
let rec loop ck (rowKeys:DateTime list) (subdf:Frame<_, _>)=
match rowKeys with
| [] -> ()
| rk::rowKeys ->
// Get all the non-missing values in the current row
let obs = subdf.Rows.[rk].As<int>().Observations
let (KeyValue(_, current)) = obs |> Seq.head
printfn "Current date '%A', column '%s' has: %A" rk ck current
// See if there is a greater one in later column
let found = obs |> Seq.tail |> Seq.tryFind (fun (KeyValue(_, v)) -> v > current)
// If no, stay on the current column. If yes, take a sub-frame
// and continue scanning that recursively
match found with
| None -> loop ck rowKeys subdf
| Some(KeyValue(ck, v)) ->
printfn "---> Found new CK: %s with value %A greater than %A" ck v current
loop ck rowKeys subdf.Columns.[ck ..]
To start looking at the range from your screenshot:
let df2010 = df.Rows.[DateTime(2010,11,1) .. DateTime(2011,12,3)]
let rowKeys = df2010.RowKeys |> List.ofSeq
let ck = "GCA201012"
loop ck rowKeys df2010.Columns.[ck ..]