Search code examples
pythonpython-3.xpandaspython-2.7finance

Matching data frame columns based on multiple constraints in Pandas


I am working on a project with financial data. In this case I have two data frames, one containing fundamental values and the other with prices. Because I had to use two different databases the dfs differ in size, mainly due to mismatches in companies ('tic') and/or dates ('year').

Here is an example for fund_df:

        tic       assets      year
0      AAPL       123.99      1999
1      AAPL       143.20      2000
..     ...          ...       ...
200    GMCF         9.56      2013
201    GMCF        11.21      2014
..     ...          ...       ...
1543   TSLA       201.23     2015
1544   TSLA       233.49     2016

Whereas the prices_df looks like this:

        tic       prices      year
0      MPRD       56.789      2000
1      MPRD       48.222      2001
..     ...          ...       ...
200    GM         87.991      2012
201    GM        102.334      2013
..     ...          ...       ...
1543   ZZ         34.567      2017
1544   ZZ         29.887      2018

My goal is to unify these two dfs so that I can work with both fundamental and price values. However, because I am afraid that the datasets my differ both under a tic and year point of view, I wish to unify these values based on a match that needs to be at both levels, so that the fundamental values of TSLA in 2015 match the price of TSLA in 2015.

I have tried the following piece of code:

merged_df = merge(fund_df, prices_df, by.fund_df=['tic', 'year'], by.prices_df['tic', 'year'], all = TRUE)

However I keep on receiving this error message:

SyntaxError: keyword can't be an expression

Can anyone help me find a solution to merge the two dfs based on both a tic and year constraint?

Thank you in advance for your help, I appreciate it!


Solution

  • As per the documentation, the syntax for pandas.merge is:

    merged_df = merge(
        fund_df,                     # left dataframe
        prices_df,                   # right dataframe
        left_on=['tic', 'year'],     # columns to join on in left dataframe
        right_on=['TICKER', 'year'], # columns to join on in right dataframe
        how='outer')                 # type of join (e.g. inner, outer, left, etc.)