Search code examples
pythonnumpyrandomtime-seriesgenerate

Generate daily time series date from monthly usage with python


I have about two years of monthly gas usage for a city and want to generate daily use concerning daily usage sum equal to monthly and keep time-series shape, but I don't know how to do that.

Here is my data Link [1]


Solution

  • The following code sample demonstrates date and data interpolation using pandas.

    The following steps are taken:

    • Using the provided dataset, read this into a DataFrame.
    • Calculate a cumulative sum of usage data.
    • Set the DataFrame's index as the date, to facilitate date resampling.
    • Resample for dates to a daily frequency.
    • Calculate the daily usage.

    Example code:

    # Read the CSV and convert dates to a datetime object.
    path = '~/Downloads/usage.csv'
    df = pd.read_csv(path, 
                     header=0, 
                     names=['date', 'gas_usage'], 
                     converters={'date': pd.to_datetime})
    # Calculate a cumulative sum to be interpolated.
    df['gas_usage_c'] = df['gas_usage'].cumsum()
    # Move the date to the index, for resampling.
    df.set_index('date', inplace=True)
    
    # Resample the data to a daily ('D') frequency.
    df2 = df.resample('D').interpolate('time')
    # Calculate the daily usage.
    df2['daily_usage'] = df2['gas_usage_c'].diff()
    

    Sample output of df2:

                   gas_usage   gas_usage_c   daily_usage
    date                                                
    2016-03-20  3.989903e+07  3.989903e+07           NaN
    2016-03-21  3.932781e+07  4.061487e+07  7.158445e+05
    2016-03-22  3.875659e+07  4.133072e+07  7.158445e+05
                     ...           ...           ...
    2018-02-18  4.899380e+07  7.967041e+08  1.598856e+06
    2018-02-19  4.847973e+07  7.983029e+08  1.598856e+06
    2018-02-20  4.796567e+07  7.999018e+08  1.598856e+06
    
    [703 rows x 3 columns]
    

    Visual confirmation

    I've included two simple graphs to illustrate the dataset alignment and interpolation.

    enter image description here

    enter image description here

    Plotting code:

    For completeness, the rough plotting code is included below.

    from plotly.offline import plot
    
    plot({'data': [{'x': df.index, 
                    'y': df['gas_usage'], 
                    'type': 'bar'}], 
          'layout': {'title': 'Original',
                     'template': 'plotly_dark'}})
    plot({'data': [{'x': df2.index, 
                    'y': df2['daily_usage'], 
                    'type': 'bar'}], 
          'layout': {'title': 'Interpolated',
                     'template': 'plotly_dark'}})