Hi i am building a tool to download stock data from Yfinance and i want to download the historic data once,save it to vs files in a directory and then every day just add the last trading day(one row) and resave it..the problem is that the csv files are overwritten from scratch so if i provide just the last trading day to be downloaded, that will be all the rows i will end up with for each symbol. The code that i work with is in two parts.. first to create the dataframe in pandas and then save it with all my new funtions included and then get the data frame in any other project that i want, from the directory it was saved. I also have a second question regarding wheather this is good practice to work with csv files and pandas dataframe ,instead of sqlite database and what are the pros and cons between the different methods.
Part 1 Create the directory and csv files
import yfinance as yf
import os
import pandas as pd
with open(r'C:\zPythonFilesDir\yfinance\symbolstest.csv') as f:
lines = f.read().splitlines()
#print(lines)
for symbol in lines:
print(symbol)
data = yf.download(symbol, start="2023-07-01", end="2023-07-30")
#print(data)
data.to_csv(r'C:\zPythonFilesDir\yfinance\datasetstest\{}.csv'.format(symbol))
for filename in os.listdir(r'C:\zPythonFilesDir\yfinance\datasetstest'):
#print(filename)
symbol = filename.split(".")[0]
print(symbol)
df = pd.read_csv(r'C:\zPythonFilesDir\yfinance\datasetstest/{}'.format(filename))
# new columns and functions here
x = 2 * round(df['High'],2)
df['new_val'] = x
# write new datasets with functions columns in different directory
df.to_csv(r'C:\zPythonFilesDir\yfinance\newdatasetstest\{}.csv'.format(symbol))
print(df)
Part 2 Get the data from csv files
import os
import pandas as pd
for filename in os.listdir(r'C:\zPythonFilesDir\yfinance\newdatasetstest'):
#print(filename)
symbol = filename.split(".")[0]
print(symbol)
df1= pd.read_csv(r'C:\zPythonFilesDir\yfinance\newdatasetstest/{}'.format(filename))
print(df1)
df2 = pd.read_csv(r'C:\zPythonFilesDir\yfinance\newdatasetstest/AAPL.csv')
print(df2)
The symbolstest.csv is having the symbols of interest
Data frame default to_csv method use write mode when opening file what you need to do is change it to append, which can be done with.
SAVE_TO_PATH = r'C:\zPythonFilesDir\yfinance\newdatasetstest\{}.csv'.format(symbol)
df.to_csv(SAVE_TO_PATH, mode='a', header=False)
I also moved path outside function arguments for readability and set header to false as we don't want to append them everyday
btw it is cSV(comma separated values) not cVS.
bottom line is to use mode='a' when you want to add lines to existing file or use mode='w' when you are adding for the first time/want to overwrite file.(mode='w' is default so you dont have to specify it)