I have several csv files downloaded from Yahoo finance (each has same number of columns with the same name and same number of rows) in my folder, and I tried to read them into python in one go. I tried with 12 files.
Each file has columns Date
, High
, Low
, Close
, Adj Close
, Volume
.
I searched online, and my code is as follows:
csvs = [x for x in os.listdir('.') if x.endswith('.csv')]
fns = [os.path.splitext(os.path.basename(x))[0] for x in csvs]
d = {}
for i in range(len(fns)):
d[fns[i]] = pd.read_csv(csvs[i])
print(d)
Then I get d
which consists of 12 dataframes (each has columns Date
and Close
and other columns with the same name.
Now I am struggling with how to extract the 'Date' and 'Close' of each dataframes out of the dict d
and join as a new dataframe (one column as Date
and 12 columns as the Close
, the Dates are same), and keep the Close
column name as the name of the dataframe in d
?
I tried creating a list like this
df_list = [d['AAPL'], d['AMD'], d['BIDU'], d['GOOGL'],d['MSFT'], d['NVDA'], d['NXPI'], d['QCOM'], d['SWKS'], d['TXN'], d['^IXIC'], d['^NDXT']]
and then step by step like this
aapl = df_list[0]
amd = df_list[1]
bidu = df_list[2]
googl = df_list[3]
msft = df_list[4]
nvda = df_list[5]
nxpi = df_list[6]
qcom = df_list[7]
swks = df_list[8]
txn = df_list[9]
ixic = df_list[10]
ndxt = df_list[11]
mydf = pd.concat([aapl[['Date', 'Close']], amd[['Close']]], axis = 1)
mydf = pd.concat([mydf, bidu[['Close']]], axis = 1)
mydf = pd.concat([mydf,googl[['Close']]], axis = 1)
mydf = pd.concat([mydf,msft[['Close']]], axis = 1)
mydf = pd.concat([mydf,nvda[['Close']]], axis = 1)
mydf = pd.concat([mydf,nxpi[['Close']]], axis = 1)
mydf = pd.concat([mydf,qcom[['Close']]], axis = 1)
mydf = pd.concat([mydf,swks[['Close']]], axis = 1)
mydf = pd.concat([mydf,txn[['Close']]], axis = 1)
mydf = pd.concat([mydf,ixic[['Close']]], axis = 1)
mydf = pd.concat([mydf,ndxt[['Close']]], axis = 1)
And then I got my dataframe with one column as Date
and 12 columns as Close
, but the labels of the column are all Close
.
The dataframe I got is something like this:
Date | Close | Close | Close | Close | Close | Close | Close | Close | Close | Close | Close | Close |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011-06-02 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
2011-06-03 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
2011-06-04 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
2011-06-05 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-05-28 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
There are 2515 rows, and the number 1/2 are just for example.
I am wondering that
^IXIC
and ^NDXT
):Date | AAPL | AMD | BIDU | GOOGL | MSFT | NVDA | NXPI | QCOM | SWKS | TXN | IXIC | NDXT |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011-06-02 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
2011-06-03 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
2011-06-04 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
2011-06-05 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-05-28 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
I am new to Python and playing with dataframes. Hope I have explained my question clearly, and any help would be greatly appreciated.
You should read these files in one go and unstack them. A sample code (since I do not have your input files) to sketch the idea...
from glob import glob
import pandas as pd
def read_file(f):
df = pd.read_csv(f)
df['ticker'] = f.split('.')[0].strip('^')
return df
df = pd.concat([read_file(f) for f in glob('*.csv')])
df = df.set_index(['Date','ticker'])[['Close']].unstack()