Search code examples
pythonpandasdatetimeyfinance

Update only latest data from yfinance with pandas and datetime


I've an old cvs file with this Yahoo Finance data named 'AAPL.csv'

Date Open High Low Close Volume
2023-09-25 174.19 176.97 174.14 176.08 46172700
2023-09-26 174.82 175.19 171.66 171.96 64588900
2023-09-27 172.61 173.03 169.05 170.42 66921800
2023-09-28 169.33 172.02 167.61 170.69 56294400
2023-09-29 172.02 173.07 170.33 171.21 51814200

I need to check today date with latest row date and update the latest days data and save in the same file. I've this code but have some problems with date format and not runs ok.

import os
import datetime
import pandas as pd
import dateutil.parser
from datetime import datetime

read_data = pd.read_csv('AAPL.csv')
read_last_date_df = str(read_data['Date'].values[-1])
last_date_df = dateutil.parser.isoparse(read_last_date_df)

today = datetime.today()
dif_day_dates = int(today.strftime('%d')) - int(last_date_df.strftime('%d'))

if dif_day_dates > 0:
   update_data = yf.download(symbol, start=datetime.today() - last_date_df, end=datetime.today(), interval=timeframes_codedata['daily'])
   read_data[len(read_data)] = update_data
   read_data.to_csv('AAPL.csv')

Now i've this error: if dif_day_dates > 0: TypeError: '>' not supported between instances of 'datetime.timedelta' and 'int'

Any suggestion or idea to solve this problem and apply this function in my app? Thanks a lot


Solution

  • if dif_day_dates > 0: is correct. The problem is:

    update_data = yf.download(symbol, start=datetime.today() - last_date_df,
    end=datetime.today(), interval=timeframes_codedata['daily'])
    

    Let's look:

    start = datetime.today() - last_date_df
    print(start)
    print(type(start))
    '''
    (datetime.timedelta(days=6, seconds=54260, microseconds=767898),)
    <class 'tuple'>
    '''
    

    According to documentation start param takes a string expression and must be in (YYYY-MM-DD) format.

    def download(tickers, start=None, end=None, actions=False, threads=True, ignore_tz=None,
                 group_by='column', auto_adjust=False, back_adjust=False, repair=False, keepna=False,
                 progress=True, period="max", show_errors=None, interval="1d", prepost=False,
                 proxy=None, rounding=False, timeout=10, session=None):
        """Download yahoo tickers
        :Parameters:
            tickers : str, list
                List of tickers to download
            period : str
                Valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
                Either Use period parameter or use start and end
            interval : str
                Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
                Intraday data cannot extend last 60 days
            start: str
                Download start date string (YYYY-MM-DD) or _datetime, inclusive.
                Default is 99 years ago
                E.g. for start="2020-01-01", the first data point will be on "2020-01-01"
            end: str
                Download end date string (YYYY-MM-DD) or _datetime, exclusive.
                Default is now
                E.g. for end="2023-01-01", the last data point will be on "2022-12-31"
    

    To avoid this:

    from datetime import timedelta
    start = (last_date_df +  timedelta(days=1)).strftime('%Y-%m-%d')
    end = datetime.now().strftime('%Y-%m-%d')
    

    Now you can use these two values ​​in the yf.dowloand() function.