I have three time series dataframes
df_list=[px, SC, SMA]
Each dataframe has identical columns and identical dates. I also have a list of securities (condensed for the purpose of the example):
securities3=['SPX Index','BIL US Equity']
I'm trying to do something like the below, which does not have the correct syntax:
for df in df_list:
for col in df.columns:
if col==securities3:
***create new dataframe here***
In words, I want to iterate over each dataframe in df_list, within each column of df, when the column matches the components of securities3, I want a new dataframe to be formed with those three columns (one column from each dataframe, matching securities3 list.
To provide more detail, please see below for sample data:
import pandas as pd
px_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
'SPX Index': [58.63, 21.25, 19.17, 18.8],
'BIL US Equity': [35,105,27,98]}
SC_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
'SPX Index': [20.50, 6, 82, 74.6],
'BIL US Equity': [74,62,8,99]}
SMA_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
'SPX Index': [2, 95.3, 39, 68.27],
'BIL US Equity': [58,37,74,11]}
px = pd.DataFrame(px_data)
SC = pd.DataFrame(SC_data)
SMA = pd.DataFrame(SMA_data)
The target output is:
SPX_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
'SPX Index': [58.63, 21.25, 19.17, 18.8],
'SPX Index': [20.50, 6, 82, 74.6],
'SPX Index': [2, 95.3, 39, 68.27]}
SPX = pd.DataFrame(SPX_data)
I'm trying to create the SPX dataframe (for example) in a loop, because the list (securities3) can change. Also, as a point of reference, because this is part of a larger process, the columns in the dataframes will always be identical to the strings (and positions) within the list, as they are in this example.
Thanks for the help.
Assuming all dataframes have the same dates in the same order, you can concatenate the dates + the columns with values on axis=1
:
date = df_list[0]["Date"]
out = [pd.concat([date] + [df[sec] for df in df_list], axis=1) for sec in securities3]
for df in out:
print(df)
Date SPX Index SPX Index SPX Index
0 8/11/18 58.63 20.5 2.00
1 8/12/18 21.25 6.0 95.30
2 8/13/18 19.17 82.0 39.00
3 8/14/18 18.80 74.6 68.27
Date BIL US Equity BIL US Equity BIL US Equity
0 8/11/18 35 74 58
1 8/12/18 105 62 37
2 8/13/18 27 8 74
3 8/14/18 98 99 11
Update:
If 'Date'
is the index, remove the date from the concat
call.
If you want each dataframe in a variable, remove the outer loop in the list comprehension.
for sec in securities3:
df = pd.concat([df[sec] for df in df_list], axis=1)