Search code examples
pythonpandastime-seriesmissing-data

Adding missing dates to a time series from a csv file


I have a monthly time series of a variable 'GWL' but with several missing dates.

import pandas as pd

df = pd.read_csv(r'1218_29_0.csv')
df.head(5)

    date        GWL
0   15/01/2001  9.73
1   15/08/2001  10.55
2   15/11/2001  11.65
3   15/01/2002  9.72
4   15/04/2002  9.92

I have tried to follow other posts but none of them deal with a database in CSV format.

How can I add the missing dates (months) and fill their value by Nan?


Solution

  • I load using:

    df = pd.read_csv(io.StringIO('''date        GWL
    15/01/2001  9.73
    15/08/2001  10.55
    15/11/2001  11.65
    15/01/2002  9.72
    15/04/2002  9.92'''), sep='\s{2,}', engine='python', parse_dates=['date'])
    

    What you need to do in your code is just pass the parse_dates=['date'] parameter to your pd.read_csv. Don't pass the other stuff. I need to use io.StringIO because you won't provide your data in a constructor format.

    This yields:

            date    GWL
    0 2001-01-15   9.73
    1 2001-08-15  10.55
    2 2001-11-15  11.65
    3 2002-01-15   9.72
    4 2002-04-15   9.92
    

    Construct an Ides-centred monthly date range:

    months = df['date'] - pd.offsets.MonthBegin()
    d_range = pd.date_range(months.min(), months.max(), freq='M')
    d_range = d_range - pd.offsets.MonthBegin() + pd.offsets.Day(14)
    

    Reindex:

    >>> df.set_index('date').reindex(d_range)
                  GWL
    2001-01-15   9.73
    2001-02-15    NaN
    2001-03-15    NaN
    2001-04-15    NaN
    2001-05-15    NaN
    2001-06-15    NaN
    2001-07-15    NaN
    2001-08-15  10.55
    2001-09-15    NaN
    2001-10-15    NaN
    2001-11-15  11.65
    2001-12-15    NaN
    2002-01-15   9.72
    2002-02-15    NaN
    2002-03-15    NaN