Search code examples
pythonpandasinterpolationresample

Resample and interpolate timeseries in Python


I have csv-file with wind direction (wd) and speed (ws):

datetime  wd  ws
06.02.2023 00:55  297  3.2
06.02.2023 01:55  296  2.7
06.02.2023 02:55  299  3.0
06.02.2023 03:55  302  3.5

I would like to resample the time to full hour, like so:

datetime  wd  ws
06.02.2023 01:00  297  3.2
06.02.2023 02:00  296  2.7
06.02.2023 03:00  299  3.0
06.02.2023 04:00  302  3.5

So far I have been trying with this script:

import pandas as pd
filename = r"data.csv"
df = pd.read_csv(filename, header=None, sep=";", skiprows=1, usecols = [0,3,4], names = ["datetime","wd","ws"])

# sspecifying the date format
df['index_time']= pd.to_datetime(df['local_time'], format='%d.%m.%Y %H:%M')

# change date-time column to index
df.set_index('index_time', inplace=True)

# trying to resample
df_resampled = df.resample(rule='H')

print(df_resampled) OUTPUT:

                           local_time     wd   ws
index_time                                       
2023-02-06 00:00:00               NaN    NaN  NaN
2023-02-06 01:00:00               NaN    NaN  NaN
2023-02-06 02:00:00               NaN    NaN  NaN
2023-02-06 03:00:00               NaN    NaN  NaN

How can I resample only the time but leave the data as it was?


Solution

  • Perhaps I have misunderstood but it seems that you just want to round the time up or down to the nearest hour? If so, then the following code should work:

    import pandas as pd
    from datetime import datetime, timedelta
    
    data = {
        'date': ['06.02.2023', '06.02.2023', '06.02.2023', '06.02.2023'],
        'time': ['0:55', '1:55', '2:55', '3:55'],
        'wd': [297, 296, 299, 302],
        'ws': [3.2, 2.7, 3.0, 3.5]
    }
    
    df = pd.DataFrame(data)
    
    def round_up_to_nearest_hour(time_str):
        time_obj = datetime.strptime(time_str, '%H:%M') #Convert string of time to time object
        if time_obj.minute >= 30: #If minutes are above 30...
            time_obj += timedelta(hours=1) #...increase the hour by 1
        time_obj = time_obj.replace(minute=0) #Reset minutes to zero
        return time_obj.strftime('%H:%M') #Return the new time
    
    df['time'] = df['time'].apply(round_up_to_nearest_hour) #Apply this function to all values in the time column of df
    

    I ran this in Jupyter (See below: I switched one value to be below the 30 min mark) and it looks like it's doing what you want? Hope this helps!

    enter image description here