Search code examples
pythonnumpytime-seriesdata-cleaning

How to fix/reset decreasing timestamps while preserving gaps in time-series data for CNN training?


I'm currently attempting to preprocess data from a study in which video data was collected from bees.

I am having a few issues dealing with situations in whih there have been issues with the recording and, instead of increasing, there is a sudden decrease in timestamp values instead.

This can be shown in this small sample

in which the values suddenly decrease to 97.14 instead of an expected output along the lines of 3886.826

I am unsure on how to alter the timestamps after this situation to ensure the values are constantly increasing but the general pattern remains the same. For example, there are gaps within the data that I would like to remain.

Since the usual difference between the timestamps is 1/30 I tried to calculate the difference between the expected value i + 1 = time[i] + 1/30, overwrite (i + 1) with this expected value then apply this difference to all values afterwards

time_diff = 1 / 30.0

new_time = np.copy(time)

for i in reset_indices:
    new_time[i + 1] = new_time[i] + time_diff

    for j in range(i + 2, min(i + 1 + 100, len(new_time))):
        new_time[j] = new_time[j - 1] + time_diff

Here is the code I use to check whether changes have been applied to a segment

def view_segment(x, y):
    for i in range(x, min(y, len(time) - 1)):
        print(f"Index {i}: time = {time[i]}")

def view_new_segment(x, y):
    print("\nUpdated segment:")
    for i in range(x, min(y, len(new_time) - 1)):
        print(f"Index {i}: time = {new_time[i]}")

x = #index right before reset
y = #index after reset

print("Initial segment:")
view_segment(x, y)
view_new_segment(x, y)

The output is as follows

Initial segment:

Index 27443: time = 3886.66

Index 27444: time = 3886.693

Index 27445: time = 3886.726

Index 27446: time = 3886.76

Index 27447: time = 3886.793

Index 27448: time = 97.14

Index 27449: time = 97.173

Index 27450: time = 97.207

Index 27451: time = 97.24

Index 27452: time = 97.273

Updated segment:

Index 27443: time = 3886.66

Index 27444: time = 3886.693

Index 27445: time = 3886.726

Index 27446: time = 3886.76

Index 27447: time = 3886.793

Index 27448: time = 97.14

Index 27449: time = 97.173

Index 27450: time = 97.20633333333333

Index 27451: time = 97.23966666666666

Index 27452: time = 97.273

With an expected output being along the lines of

Index 27448: time = 3886.826

Index 27449: time = 3886.859

Index 27450: time = 3886.892

Index 27451: time = 3886.925

Index 27452: time = 3886.958

How would I go about correcting these sudden decreases in a way that will preserve gaps within this data?


Solution

  • While I don't know what you mean by gaps and how you want them to be handled, here is an example on how to achive the desired output you provided:

    indices = [0,1,2,3,4]
    
    times = np.array([3886.66, 3886.693, 97.14, 97.207, 88.456])
    
    def correct_data(indices, times):
        for i in indices[1:]: # skip first element, because this will not be changed
            if times[i] < times[i-1]:
                temp_value = times[i-1] + 1/30
                addition_value = temp_value - times[i]
                times[i:] = times[i:] + addition_value
    
            else:
                pass
        return times
    

    Calling

        print(correct_data(indices=indices, times=times))
    

    returns

    [3886.66       3886.693      3886.72633333 3886.79333333 3886.82666667]
    

    For each time in your array, the function checks if the current time is smaller than the previous one. If this is not the case, it moves on to next time. Otherwise temp_value is calculated, which is just the previous time plus 1/30. addition_value is temp_value minus the current time. This value can then be added to all following times. I am not 100% sure this was your idea, but this way you preserve the realtive difference between values in the same time error.

    To avoid a nested loop you can simply add this value to all elements in the array that are affected by the error.