Frame.aggregateRowsBy
is used to perform the aggregation.
In this case, the third argument, aggFunc
, will be an aggregate function such as Stats.max
.
However, most of the functions defined in the Stats
module seem to support only float
.
How can I aggregate the maximum and minimum values as DateTime
or decimal
?
The following is sample code that results in a runtime error.
Frame([ "id"; "date"; "quantity"], [Series.ofValues [1; 1; 2]; Series.ofValues [ DateTime(2023, 4, 13); DateTime(2023, 4, 10); DateTime(2023, 4, 16) ]; Series.ofValues [ 2.7M; 1.2M; 1.5M; ]])
|> Frame.aggregateRowsBy ["id"] ["date"; "quantity"] Stats.max
Error: System.InvalidCastException: Invalid cast from 'DateTime' to 'Double'.
There are two things to accomplish:
float
values (e.g. DateTime
, IComparable
)Frame
should respect the value type of the pre-aggregation. (The result of the aggregation must not be a float
.)One limitation of the aggregateRowsBy
function is that you can only use this to aggregate columns where the values can be converted to one shared type. You can always make that obj
, but then it will be very hard to do anything useful with the values. In your case, you can actually use IComparable
, which will let you pass the values of the series to Seq.max
:
df
|> Frame.aggregateRowsBy ["id"] ["date"; "quantity"]
(fun (ds:Series<_, IComparable>) -> ds.Values |> Seq.max)
In more complex cases, this probably won't work. Then it probably will be easier to run aggregateRowsBy
multiple times (on different types of columns) - something like:
let dq = df |> Frame.aggregateRowsBy ["id"] ["quantity"] Stats.max
let dd = df |> Frame.aggregateRowsBy ["id"] ["date"] (fun (ds:Series<_, DateTime>) -> Seq.max ds.Values)
dq.AddColumn("date", dd.Columns.["date"])
dq.Print()
For reference, the obj
-based version would be:
df |> Frame.aggregateRowsBy ["id"] ["date"; "quantity"] (fun (ds:Series<_, obj>) ->
let asFloat = ObjectSeries(ds).TryAs<float>()
if asFloat.HasValue then box (Stats.max asFloat.Value) else
let asDt = ObjectSeries(ds).TryAs<DateTime>()
if asDt.HasValue then box (Seq.max (asDt.Value.Values))
else failwith "Wrong column type!")