I am trying to combine two deedle dataframes in a way similar to merge
in pandas.
I will have one bigger frame where there is a year
and quarter
column with other data beside it, and then a smaller frame with a year
, quarter
, and ad partner
column. I want to add the ad partner to every instance of the same year and quarter in the larger table.
There is some sample data at the link below: https://datascienceparichay.com/article/pandas-merge-dataframes-on-multiple-columns/
Any help would be greatly appreciated!
Thanks!
EDIT
I now have this code, although it throws a keyNotFound
exception if there is a key in df1
that is not in df2
, and does not return a table.
var df2ByYQ = df2.IndexRowsUsing(row => (row.GetAs<string>("Year".Trim()), row.GetAs<string>("Quarter".Trim())));
var bCol = df1.Rows.SelectOptional(kvp => df2ByYQ.Rows[(kvp.Value.Value.GetAs<string>("Year".Trim()), kvp.Value.Value.GetAs<string>("Quarter".Trim()))].TryGetAs<int>("b"));`
You can do this by reindexing the two data frames so that they both have the same row index formed by a tuple consisting of the year and quarter. The easiest way to do this is to use the Frame.indexRowsUsing
function. For example, given two frames with a
column in one and b
in the other:
let df1 =
Frame.ofRecords [
{|year=1999; quarter="q1"; a=1|}
{|year=1999; quarter="q2"; a=2|} ]
let df2 =
Frame.ofRecords [
{|year=1999; quarter="q1"; b=3|}
{|year=1999; quarter="q2"; b=4|} ]
let df1ByYQ =
df1 |> Frame.indexRowsUsing (fun row ->
row.GetAs<int>("year"), row.GetAs<string>("quarter"))
let df2ByYQ =
df2 |> Frame.indexRowsUsing (fun row ->
row.GetAs<int>("year"), row.GetAs<string>("quarter"))
Now you can use join
to join the two frames - you just have to drop the columns that are shared in both of the frames (the operation does not allow overlapping columns):
let res =
df1ByYQ
|> Frame.dropCol "year"
|> Frame.dropCol "quarter"
|> Frame.join JoinKind.Inner df2ByYQ
EDIT: In the scenario discussed in the comments, where df1
has rows with multiple records per year/quarter and we want to add information from a unique row in another data frame, I would reindex the other data frame and then use the map
operation to find the corresponding matching row in the other data frame (this will be quite fast, because lookup by index is done via a hashtable):
let df1 =
Frame.ofRecords [
{|year=1999; quarter="q1"; month="Jan"; a=1|}
{|year=1999; quarter="q1"; month="Feb"; a=2|}
{|year=1999; quarter="q2"; month="Jan"; a=3|} ]
let df2 =
Frame.ofRecords [
{|year=1999; quarter="q1"; b=3|}
{|year=1999; quarter="q2"; b=4|} ]
let df2ByYQ =
df2 |> Frame.indexRowsUsing (fun row ->
row.GetAs<int>("year"), row.GetAs<string>("quarter"))
let bColumn = df1 |> Frame.mapRows (fun _ row ->
df2ByYQ.Rows.[(row.GetAs "year", row.GetAs "quarter")].GetAs<int>("b") )
df1.AddColumn("b", bColumn)