Search code examples
pythontime-seriesextrapolation

Extrapolate time series data based on Start and end values, using Python?


I have an excel sheet with values representing start and end_time of a time series data, as shown below. Times are in seconds.

+------------+---------+-------+
Start_Time   End_Time  Value
0            2      A
2            3      B
3            9      A
9            11     C

I want to extrapolate the values between start and end_time and display the values for each second.

+---------+------+ Time Value 0 A 1 A 2 A 3 B 4 A 5 A 6 A 7 A 8 A 9 A 10 C 11 c

Any help to implement it in Python will be appreciated. Thanks.


Solution

  • Setup

    You should find how to read your excel sheet with pandas easily, and options will depend on the file itself, so I won't cover this part. Below is the reproduction of your sample dataframe, used for the example.

    import pandas as pd
    
    df = pd.DataFrame({'Start_Time': [0, 2, 3, 9],
                       'End_Time': [2, 3, 9, 11],
                       'Value': ['A', 'B', 'A', 'C']})
    >>> df
    Out[]:
       End_Time  Start_Time Value
    0         2           0     A
    1         3           2     B
    2         9           3     A
    3        11           9     C
    

    Solution

    (pd.Series(range(df.End_Time.max() + 1), name='Value')  # Create a series on whole range
       .map(df.set_index('End_Time').Value)                 # Set values from "df"
       .bfill()                                             # Backward fill NaNs values
       .rename_axis('Time'))                                # Purely cosmetic axis rename
    Out[]:
    Time
    0     A
    1     A
    2     A
    3     B
    4     A
    5     A
    6     A
    7     A
    8     A
    9     A
    10    C
    11    C
    Name: Value, dtype: object
    

    Walkthrough

    Create the whole "Time" range

    s = pd.Series(range(df.End_Time.max() + 1))
    
    >>> s
    Out[]:
    0      0
    1      1
    2      2
    3      3
    4      4
    5      5
    6      6
    7      7
    8      8
    9      9
    10    10
    11    11
    dtype: int32
    

    Use "End_Time" as index for df

    >>> df.set_index('End_Time')
    Out[]:
              Start_Time Value
    End_Time
    2                  0     A
    3                  2     B
    9                  3     A
    11                 9     C
    

    Map df values to corresponding "End_Time" values from s

    s = s.map(df.set_index('End_Time').Value)
    
    >>> s
    Out[]:
    0     NaN
    1     NaN
    2       A
    3       B
    4     NaN
    5     NaN
    6     NaN
    7     NaN
    8     NaN
    9       A
    10    NaN
    11      C
    dtype: object
    

    Backward-fill the NaN values

    s = s.bfill()
    
    >>> s
    Out[]:
    0     A
    1     A
    2     A
    3     B
    4     A
    5     A
    6     A
    7     A
    8     A
    9     A
    10    C
    11    C
    dtype: object
    

    Then rename_axis('Time') only renames the series axis to match your desired output.

    Note that this works here because you use excluding Start_Time. If you were using including Start_Time (where Value really starts at Start_Time, which is more common) you should change End_Time to Start_Time and bfill() to ffill() (forward-fill).