Search code examples
pandasdateheadertimestamp

Convert Date headers followed by AM & PM time cells to whole Timestamp column


How to convert 'Time & Date' column to timestamp? As you can see there's a header cell for each date followed by AM & PM times. I would like to have a whole timestamp column.

Time & Date                 Country  ...  Consensus  Forecast
15                  4:00 PM      DE  ...        NaN       NaN
16    Tuesday April 02 2019     NaN  ...  Consensus  Forecast
17                  7:00 AM      EA  ...        NaN       NaN
18                  7:00 AM      ES  ...     -33.3K    -38.6K
19                  8:30 AM      GB  ...       49.8      49.1
20                  9:00 AM      CY  ...        NaN     8.90%
21                  9:40 AM      RO  ...      2.50%     2.50%
22                 10:00 AM      IE  ...        NaN     5.50%
23                  5:30 PM      DE  ...        NaN       NaN
24  Wednesday April 03 2019     NaN  ...  Consensus  Forecast
25                  7:15 AM      ES  ...         55      52.5
26                  7:45 AM      IT  ...       50.8      50.1
27                  7:50 AM      FR  ...       48.7      48.7
28                  7:55 AM      DE  ...       54.9      54.9
29                  8:00 AM      EA  ...       52.7      52.7
30                  8:30 AM      GB  ...       50.9      50.5
31                  9:00 AM      EA  ...      0.20%     0.40%
32                  9:00 AM      EA  ...      2.30%     1.80%
33                 11:25 AM      PL  ...      1.50%     1.50%
34   Thursday April 04 2019     NaN  ...  Consensus  Forecast
35                  4:30 AM      NL  ...        NaN     2.60%
36                  6:00 AM      DE  ...      0.30%     0.50%
37                  7:30 AM      DE  ...        NaN        54
38                 11:30 AM      EA  ...        NaN       NaN
39     Friday April 05 2019     NaN  ...  Consensus  Forecast
40                  6:00 AM      DE  ...      0.50%     0.70%
41                  6:45 AM      FR  ...     €-4.7B    €-4.7B
42                  7:30 AM      GB  ...     -2.40%    -2.20%
43                  7:30 AM      GB  ...      2.30%     1.50%
44                 11:30 AM      ES  ...        NaN      92.5

Solution

  • Your Time & Date column represents two different things, so it needs to be two different columns to start with. If there's a way to cut out that step I would love to see it, but I'm guessing you need to expand it into two columns before combining again before using pandas.to_datetime() with the format argument to get it into datetime.

    First, get the dates into a different column, then forward-fill the missing values.

    df['date'] = df['Time & Date'].apply(
        lambda x: np.nan if (('AM' in str(x))|('PM' in str(x))) else x
    ).ffill()
    

    Then you can rename Time & Date to time, slice out the date columns in that row from the dataframe, and concatenate date and time together.

    df.rename(columns={'Time & Date': 'time'}, inplace=True)
    
    df = df.loc[df.time.str.contains('AM|PM', regex=True)]
    
    df['datetime'] = df.date + ' ' + df.time
    

    From there all you have to do is find the right format for pd.to_datetime() and you're set!