I'm following the finance tutorials on PythonProgramming.net and have run into an issue when I try to combine several dataframes into one large dataframe. I created a function to do this:
def compile_data():
with open ("sp500tickers.pickle", "rb") as f:
tickers = pickle.load(f)
main_df = pd.DataFrame()
for count,ticker in enumerate(tickers):
try:
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
df.set_index('Date', inplace=True)
df.rename(columns={'Close':ticker}, inplace=True)
df.drop(['Open','High','Low','Volume'], 1, inplace=True)
if main_df.empty:
main_df = df
else:
main_df.join(df, how='outer')
print(main_df.head())
if count % 10 == 0:
print(count)
except Exception:
pass
print(main_df.head())
main_df.to_csv('sp500joinedcloses.csv')
(The reason I used the try/except in the above code was because I have a list of all the tickers for the S&P500, but wasn't able to grab data from Google Finance API for all of them... so this way, if it tries to find a csv that I don't have, it will still combine the ones I do have without throwing an error.)
When I run this function, it creates a CSV called sp500joinedcloses.csv, but it only contains the data for one ticker, namely ABBV. I know it is iterating through the tickers properly, because if I add a print(ticker) in the for loop, all the correct tickers are printed.
It's also worth noting that ABBV isn't the first csv that I have that should be included in the dataframe. They first ticker that should have a file is AAPL, and then ABBV. No idea why it seems to skip AAPL.
I would appreciate any help. I am a beginner to pandas, and really want to learn everything I can about it.
IIUC:
You don't want to use join
and you couldn't anyway if you start with an empty dataframe. Use pd.concat
instead:
main_df = pd.concat([main_df, df], axis=1)
However, I would recommend this to replace your whole process:
def read_file(ticker):
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)).set_index('Date')
return df.Close.rename(ticker)
with open ("sp500tickers.pickle", "rb") as f:
tickers = pickle.load(f)
main_df = pd.concat([read_file(t) for t in tickers], axis=1)