I'm currently attempting to preprocess data from a study in which video data was collected from bees.
I am having a few issues dealing with situations in whih there have been issues with the recording and, instead of increasing, there is a sudden decrease in timestamp values instead.
This can be shown in this small sample
in which the values suddenly decrease to 97.14 instead of an expected output along the lines of 3886.826
I am unsure on how to alter the timestamps after this situation to ensure the values are constantly increasing but the general pattern remains the same. For example, there are gaps within the data that I would like to remain.
Since the usual difference between the timestamps is 1/30 I tried to calculate the difference between the expected value i + 1 = time[i] + 1/30
, overwrite (i + 1)
with this expected value
then apply this difference to all values afterwards
time_diff = 1 / 30.0
new_time = np.copy(time)
for i in reset_indices:
new_time[i + 1] = new_time[i] + time_diff
for j in range(i + 2, min(i + 1 + 100, len(new_time))):
new_time[j] = new_time[j - 1] + time_diff
Here is the code I use to check whether changes have been applied to a segment
def view_segment(x, y):
for i in range(x, min(y, len(time) - 1)):
print(f"Index {i}: time = {time[i]}")
def view_new_segment(x, y):
print("\nUpdated segment:")
for i in range(x, min(y, len(new_time) - 1)):
print(f"Index {i}: time = {new_time[i]}")
x = #index right before reset
y = #index after reset
print("Initial segment:")
view_segment(x, y)
view_new_segment(x, y)
The output is as follows
Initial segment:
Index 27443: time = 3886.66
Index 27444: time = 3886.693
Index 27445: time = 3886.726
Index 27446: time = 3886.76
Index 27447: time = 3886.793
Index 27448: time = 97.14
Index 27449: time = 97.173
Index 27450: time = 97.207
Index 27451: time = 97.24
Index 27452: time = 97.273
Updated segment:
Index 27443: time = 3886.66
Index 27444: time = 3886.693
Index 27445: time = 3886.726
Index 27446: time = 3886.76
Index 27447: time = 3886.793
Index 27448: time = 97.14
Index 27449: time = 97.173
Index 27450: time = 97.20633333333333
Index 27451: time = 97.23966666666666
Index 27452: time = 97.273
With an expected output being along the lines of
Index 27448: time = 3886.826
Index 27449: time = 3886.859
Index 27450: time = 3886.892
Index 27451: time = 3886.925
Index 27452: time = 3886.958
How would I go about correcting these sudden decreases in a way that will preserve gaps within this data?
While I don't know what you mean by gaps and how you want them to be handled, here is an example on how to achive the desired output you provided:
indices = [0,1,2,3,4]
times = np.array([3886.66, 3886.693, 97.14, 97.207, 88.456])
def correct_data(indices, times):
for i in indices[1:]: # skip first element, because this will not be changed
if times[i] < times[i-1]:
temp_value = times[i-1] + 1/30
addition_value = temp_value - times[i]
times[i:] = times[i:] + addition_value
else:
pass
return times
Calling
print(correct_data(indices=indices, times=times))
returns
[3886.66 3886.693 3886.72633333 3886.79333333 3886.82666667]
For each time in your array, the function checks if the current time is smaller than the previous one. If this is not the case, it moves on to next time. Otherwise temp_value
is calculated, which is just the previous time plus 1/30. addition_value
is temp_value
minus the current time. This value can then be added to all following times. I am not 100% sure this was your idea, but this way you preserve the realtive difference between values in the same time error.
To avoid a nested loop you can simply add this value to all elements in the array that are affected by the error.