Search code examples
c#pandasdeedle

Getting Max across 3 columns in C# Deedle Dataframe


I'm attempting to convert some pandas code over to Deedle and C#.

First the dataframe is <DateTime, string> since it's indexed on date.

Frame.FromRecords(fetchOhlcVsResults).IndexRows<DateTime>("datetime").SortRowsByKey()

Next I'm successfully adding several other computed columns in the frame. The part I'm stuck on translating is this snippet from pandas:

tr = df[['high-low', 'high-pc', 'low-pc']].max(axis=1)

Which just takes the max from across the row of the three columns. Here's what I've tried:

var tr = df.Columns[new[] { "high_low", "high_pc", "low_pc" }].IndexRows<DateTime>("datetime").SortRowsByKey();

Which gives me a new dataframe. Not sure where to go from here.

Ultimately I'd like to add the resulting series as a new column to the original dataframe.

df.AddColumn("tr", tr);

Solution

  • If I understand the problem, the question is how to calculate maximum across (some of) the columns of the data frame rather than over the rows (which is what the axis argument seems to be specifying in pandas).

    In this case, I would use the Select operation on the data frame to apply a function to each row. You can then select the columns you need and calculate the maximum over those.

    var data = new[] {
      new { Date=new DateTime(2020,1,1), Hi1=10, Hi2=15 },
      new { Date=new DateTime(2020,1,2), Hi1=12, Hi2=11 } };
    var df = Frame.FromRecords(data).IndexRows<DateTime>("Date");
    var max = df.Rows.Select(r => r.Value[new[] { "Hi1", "Hi2" }].Max());
    df.AddColumn("Hi", max);
    df.Print();
    

    This calculates the "Hi" column in the following data frame:

                  Hi1 Hi2 Hi
    01/01/2020 -> 10  15  15
    02/01/2020 -> 12  11  12