Search code examples
pythonpandasdatetimereindex

Reindexing Pandas based on daterange


I am trying to reindex the dates in pandas. This is because there are dates which are missing, such as weekends or national hollidays.

To do this I am using the following code:

import pandas as pd
import yfinance as yf 
import datetime 

start = datetime.date(2015,1,1)
end = datetime.date.today()

df = yf.download('F', start, end, interval ='1d', progress = False)
df.index = df.index.strftime('%Y-%m-%d')
full_dates = pd.date_range(start, end)
df.reindex(full_dates)

This code is producing this dataframe:

    Open    High    Low     Close   Adj Close   Volume
2015-01-01  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-02  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-03  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-04  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-05  NaN     NaN     NaN     NaN     NaN     NaN
...     ...     ...     ...     ...     ...     ...
2023-01-13  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-14  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-15  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-16  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-17  NaN     NaN     NaN     NaN     NaN     NaN

Could you please advise why is it not reindexing the data and showing NaN values instead?

===Edit ===

Could it be a python version issue? I ran the same code in python 3.7 and 3.10

In python 3.7

enter image description here

In python 3.10

enter image description here

In python 3.10 - It is datetime as you can see from the image. enter image description here

Getting datetime after yf.download('F', start, end, interval ='1d', progress = False) without strftime

enter image description here


Solution

  • Remove converting DatetimeIndex to strings by df.index = df.index.strftime('%Y-%m-%d'), so can reindex by datetimes.

    df = yf.download('F', start, end, interval ='1d', progress = False)
    full_dates = pd.date_range(start, end)
    df = df.reindex(full_dates)
    
    print (df)
                 Open   High    Low  Close  Adj Close      Volume
    2015-01-01    NaN    NaN    NaN    NaN        NaN         NaN
    2015-01-02  15.59  15.65  15.18  15.36  10.830517  24777900.0
    2015-01-03    NaN    NaN    NaN    NaN        NaN         NaN
    2015-01-04    NaN    NaN    NaN    NaN        NaN         NaN
    2015-01-05  15.12  15.13  14.69  14.76  10.407450  44079700.0
              ...    ...    ...    ...        ...         ...
    2023-01-13  12.63  12.82  12.47  12.72  12.720000  96317800.0
    2023-01-14    NaN    NaN    NaN    NaN        NaN         NaN
    2023-01-15    NaN    NaN    NaN    NaN        NaN         NaN
    2023-01-16    NaN    NaN    NaN    NaN        NaN         NaN
    2023-01-17    NaN    NaN    NaN    NaN        NaN         NaN
    
    [2939 rows x 6 columns]
    
    print (df.index)
    DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04',
                   '2015-01-05', '2015-01-06', '2015-01-07', '2015-01-08',
                   '2015-01-09', '2015-01-10',
                   ...
                   '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11',
                   '2023-01-12', '2023-01-13', '2023-01-14', '2023-01-15',
                   '2023-01-16', '2023-01-17'],
                  dtype='datetime64[ns]', length=2939, freq='D')
    

    EDIT: There is timezones difference, for remove it use DatetimeIndex.tz_convert:

    df = yf.download('F', start, end, interval ='1d', progress = False)
    
    df.index= df.index.tz_convert(None)
    
    full_dates = pd.date_range(start, end)
    df = df.reindex(full_dates)
    print (df)