I am trying to get the difference between the first and last values within groups in a deedle data frame. I am stuck at trying to get the max and min values for each point. I have a dataframe sorted by "TimeStamp", grouped by "Test" and with a column called "Value" of type float. For example with the following data:
Test1,2018/01/01 => 3
Test1,2018/01/02 => 5
Test1,2018/01/03 => 15
Test1,2018/01/04 => 25
Test1,2018/01/05 => 35
Test2,2018/01/01 => 5
Test2,2018/01/02 => 15
Test2,2018/01/03 => 15
Test2,2018/01/04 => 25
Test2,2018/01/05 => 30
I expect a result of
Test1 => 32 (35-3)
Test2 => 25 (30-5)
I assume I can use something like
input |> applyLevel fst Stats.maxBy |> Frame.ofRows
input |> applyLevel fst Stats.minBy |> Frame.ofRows
but I can not seem to get it to work no matter the combinations.
Thank you in advance
To test this, I saved your sample data in a CSV file in the following format:
Category, Timestamp, Value
Test1, 2018/01/01, 3
Test1, 2018/01/02, 5
Test1, 2018/01/03, 15
Test1, 2018/01/04, 25
Test1, 2018/01/05, 35
Test2, 2018/01/01, 5
Test2, 2018/01/02, 15
Test2, 2018/01/03, 15
Test2, 2018/01/04, 25
Test2, 2018/01/05, 30
Now, I think the issue with using Stats.max
and Stats.min
is that they return option which is None
when the series is empty - this sadly makes it impossible to pass them directly to applyLevel
(I admit this is a bit inconvenient design)
However, you can pass a custom function to applyLevel
, which also allows you to do both min
and max
at the same time:
Frame.ReadCsv("sample.csv")
|> Frame.groupRowsByString "Category"
|> Frame.applyLevel fst (fun s ->
match Stats.min s, Stats.max s with
| Some lo, Some hi -> hi - lo
| _ -> nan)
This reads the CSV file (which will introduce a new ordinal key), then it groups the data by category (which gets you a data frame with row key string * int
) and then we aggregate all rows with the same string
key using applyLevel fst
. The custom function gets minimum and maximum of the value series and subtracts it (the case with nan
would only happen if you had some missing values in s
).