Search code examples
pythonpandasdataframeindexingtime-series

How to properly extend a pandas dataframe?


given a df:

Date and Time,Open,High,Low,Close                 
07/13/2017 15:55,1.1436,1.1436,1.1436,1.1436
07/13/2017 15:56,1.1435,1.1435,1.1435,1.1435
07/13/2017 15:57,1.1434,1.1434,1.1434,1.1434
07/13/2017 15:58,1.1436,1.1436,1.1436,1.1436
07/13/2017 15:59,1.1437,1.1437,1.1437,1.1437

How can I extend the index and fill the dataframe with the values from last row, the result output would be something like this:

Extend df 24h, at 1min frequency:

Date and Time,Open,High,Low,Close
07/13/2017 15:55,1.1436,1.1436,1.1436,1.1436   |
07/13/2017 15:56,1.1435,1.1435,1.1435,1.1435   |
07/13/2017 15:57,1.1434,1.1434,1.1434,1.1434   | --> Existing Data
07/13/2017 15:58,1.1436,1.1436,1.1436,1.1436   |
07/13/2017 15:59,1.1437,1.1437,1.1437,1.1437   |
07/13/2017 16:00,1.1437,1.1437,1.1437,1.1437   
07/13/2017 16:02,1.1437,1.1437,1.1437,1.1437
07/13/2017 16:03,1.1437,1.1437,1.1437,1.1437
...
...
...
07/14/2017 15:57,1.1437,1.1437,1.1437,1.1437
07/14/2017 15:58,1.1437,1.1437,1.1437,1.1437
07/14/2017 15:59,1.1437,1.1437,1.1437,1.1437

Solution

  • One way to do this (hopefully not the only way) is to get the date_range between the first value in the Date and Time column and 24 hours plus the length of the dataframe after it (in increments of one minute: 1440 + len(df)) and make a dataframe with it; then merge that dataframe to your original dataframe. The following is an illustration:

    import pandas as pd
    
    
    d = {'Date and Time': [pd.Timestamp('2017-07-13 15:55:00'),
                           pd.Timestamp('2017-07-13 15:56:00'),
                           pd.Timestamp('2017-07-13 15:57:00'),
                           pd.Timestamp('2017-07-13 15:58:00'),
                           pd.Timestamp('2017-07-13 15:59:00')],
         'Open': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437],
         'High': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437],
         'Low': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437],
         'Close': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437]}
    
    
    df = pd.DataFrame(d)
    
    df1 = pd.date_range(start=df['Date and Time'][0], periods=1440+len(df), freq='1min').to_series().to_frame('Date and Time')
    
    print(df1.merge(df, how='left').ffill())
    

    Should yield something like the following:

               Date and Time    Open    High     Low   Close
    0    2017-07-13 15:55:00  1.1436  1.1436  1.1436  1.1436
    1    2017-07-13 15:56:00  1.1435  1.1435  1.1435  1.1435
    2    2017-07-13 15:57:00  1.1434  1.1434  1.1434  1.1434
    3    2017-07-13 15:58:00  1.1436  1.1436  1.1436  1.1436
    4    2017-07-13 15:59:00  1.1437  1.1437  1.1437  1.1437
    5    2017-07-13 16:00:00  1.1437  1.1437  1.1437  1.1437
    6    2017-07-13 16:01:00  1.1437  1.1437  1.1437  1.1437
    7    2017-07-13 16:02:00  1.1437  1.1437  1.1437  1.1437
    8    2017-07-13 16:03:00  1.1437  1.1437  1.1437  1.1437
    9    2017-07-13 16:04:00  1.1437  1.1437  1.1437  1.1437
    10   2017-07-13 16:05:00  1.1437  1.1437  1.1437  1.1437
    11   2017-07-13 16:06:00  1.1437  1.1437  1.1437  1.1437
    12   2017-07-13 16:07:00  1.1437  1.1437  1.1437  1.1437
    13   2017-07-13 16:08:00  1.1437  1.1437  1.1437  1.1437
    14   2017-07-13 16:09:00  1.1437  1.1437  1.1437  1.1437
    15   2017-07-13 16:10:00  1.1437  1.1437  1.1437  1.1437
    16   2017-07-13 16:11:00  1.1437  1.1437  1.1437  1.1437
    17   2017-07-13 16:12:00  1.1437  1.1437  1.1437  1.1437
    18   2017-07-13 16:13:00  1.1437  1.1437  1.1437  1.1437
    19   2017-07-13 16:14:00  1.1437  1.1437  1.1437  1.1437
    20   2017-07-13 16:15:00  1.1437  1.1437  1.1437  1.1437
    21   2017-07-13 16:16:00  1.1437  1.1437  1.1437  1.1437
    22   2017-07-13 16:17:00  1.1437  1.1437  1.1437  1.1437
    23   2017-07-13 16:18:00  1.1437  1.1437  1.1437  1.1437
    24   2017-07-13 16:19:00  1.1437  1.1437  1.1437  1.1437
    25   2017-07-13 16:20:00  1.1437  1.1437  1.1437  1.1437
    26   2017-07-13 16:21:00  1.1437  1.1437  1.1437  1.1437
    27   2017-07-13 16:22:00  1.1437  1.1437  1.1437  1.1437
    28   2017-07-13 16:23:00  1.1437  1.1437  1.1437  1.1437
    29   2017-07-13 16:24:00  1.1437  1.1437  1.1437  1.1437
    ...                  ...     ...     ...     ...     ...
    1415 2017-07-14 15:30:00  1.1437  1.1437  1.1437  1.1437
    1416 2017-07-14 15:31:00  1.1437  1.1437  1.1437  1.1437
    1417 2017-07-14 15:32:00  1.1437  1.1437  1.1437  1.1437
    1418 2017-07-14 15:33:00  1.1437  1.1437  1.1437  1.1437
    1419 2017-07-14 15:34:00  1.1437  1.1437  1.1437  1.1437
    1420 2017-07-14 15:35:00  1.1437  1.1437  1.1437  1.1437
    1421 2017-07-14 15:36:00  1.1437  1.1437  1.1437  1.1437
    1422 2017-07-14 15:37:00  1.1437  1.1437  1.1437  1.1437
    1423 2017-07-14 15:38:00  1.1437  1.1437  1.1437  1.1437
    1424 2017-07-14 15:39:00  1.1437  1.1437  1.1437  1.1437
    1425 2017-07-14 15:40:00  1.1437  1.1437  1.1437  1.1437
    1426 2017-07-14 15:41:00  1.1437  1.1437  1.1437  1.1437
    1427 2017-07-14 15:42:00  1.1437  1.1437  1.1437  1.1437
    1428 2017-07-14 15:43:00  1.1437  1.1437  1.1437  1.1437
    1429 2017-07-14 15:44:00  1.1437  1.1437  1.1437  1.1437
    1430 2017-07-14 15:45:00  1.1437  1.1437  1.1437  1.1437
    1431 2017-07-14 15:46:00  1.1437  1.1437  1.1437  1.1437
    1432 2017-07-14 15:47:00  1.1437  1.1437  1.1437  1.1437
    1433 2017-07-14 15:48:00  1.1437  1.1437  1.1437  1.1437
    1434 2017-07-14 15:49:00  1.1437  1.1437  1.1437  1.1437
    1435 2017-07-14 15:50:00  1.1437  1.1437  1.1437  1.1437
    1436 2017-07-14 15:51:00  1.1437  1.1437  1.1437  1.1437
    1437 2017-07-14 15:52:00  1.1437  1.1437  1.1437  1.1437
    1438 2017-07-14 15:53:00  1.1437  1.1437  1.1437  1.1437
    1439 2017-07-14 15:54:00  1.1437  1.1437  1.1437  1.1437
    1440 2017-07-14 15:55:00  1.1437  1.1437  1.1437  1.1437
    1441 2017-07-14 15:56:00  1.1437  1.1437  1.1437  1.1437
    1442 2017-07-14 15:57:00  1.1437  1.1437  1.1437  1.1437
    1443 2017-07-14 15:58:00  1.1437  1.1437  1.1437  1.1437
    1444 2017-07-14 15:59:00  1.1437  1.1437  1.1437  1.1437
    

    I hope this serves a purpose.