Search code examples
pythonpython-3.xpandasdataframeabsolute-value

Consolidating panda dataframes: Choose the smaller absolute value


I have two panda DataFrames:

Dataframe A:

date        ticker  return
2017-01-03  CRM     0.018040121229614625
2017-01-03  MSFT    -0.0033444816053511683
2017-01-04  CRM     0.024198086662915008
2017-01-04  MSFT    -0.0028809218950064386
2017-01-05  CRM     -0.0002746875429199269
2017-01-05  MSFT    0.0017687731146487362

Dataframe B:

date        ticker  return
2017-01-03  CRM     0.018040120991250852
2017-01-03  MSFT    -0.003344466975803595
2017-01-04  CRM     0.024198103213211475
2017-01-04  MSFT    -0.0028809268004892363
2017-01-05  CRM     -0.00027464144673694513
2017-01-05  MSFT    0.0017687829680113065

Now I need a third 'consolidated' dataframe:

  • Same column names
  • For each row I have to choose the 'return' data from DataframeA or DataframeB with the smaller absolute value

Any suggestions?


Solution

  • Try concat+groupby on return and return the min with key=abs :

    (pd.concat((A,B),ignore_index=True)
       .groupby(['date','ticker'])['return'].min(key=abs).reset_index())
    

             date ticker    return
    0  2017-01-03    CRM  0.018040
    1  2017-01-03   MSFT -0.003344
    2  2017-01-04    CRM  0.024198
    3  2017-01-04   MSFT -0.002881
    4  2017-01-05    CRM -0.000275
    5  2017-01-05   MSFT  0.001769