I am working on a project with financial data. In this case I have two data frames, one containing fundamental values and the other with prices. Because I had to use two different databases the dfs differ in size, mainly due to mismatches in companies ('tic') and/or dates ('year').
Here is an example for fund_df:
tic assets year
0 AAPL 123.99 1999
1 AAPL 143.20 2000
.. ... ... ...
200 GMCF 9.56 2013
201 GMCF 11.21 2014
.. ... ... ...
1543 TSLA 201.23 2015
1544 TSLA 233.49 2016
Whereas the prices_df looks like this:
tic prices year
0 MPRD 56.789 2000
1 MPRD 48.222 2001
.. ... ... ...
200 GM 87.991 2012
201 GM 102.334 2013
.. ... ... ...
1543 ZZ 34.567 2017
1544 ZZ 29.887 2018
My goal is to unify these two dfs so that I can work with both fundamental and price values. However, because I am afraid that the datasets my differ both under a tic and year point of view, I wish to unify these values based on a match that needs to be at both levels, so that the fundamental values of TSLA in 2015 match the price of TSLA in 2015.
I have tried the following piece of code:
merged_df = merge(fund_df, prices_df, by.fund_df=['tic', 'year'], by.prices_df['tic', 'year'], all = TRUE)
However I keep on receiving this error message:
SyntaxError: keyword can't be an expression
Can anyone help me find a solution to merge the two dfs based on both a tic and year constraint?
Thank you in advance for your help, I appreciate it!
As per the documentation, the syntax for pandas.merge
is:
merged_df = merge(
fund_df, # left dataframe
prices_df, # right dataframe
left_on=['tic', 'year'], # columns to join on in left dataframe
right_on=['TICKER', 'year'], # columns to join on in right dataframe
how='outer') # type of join (e.g. inner, outer, left, etc.)