I have a big dataset which contains entries in the form of:
user_id, measurement_date, value1, value2,..
The challenge that comes up is how to handle gaps in the data. The measurements were taken randomly so there will always be smaller as well as very big gaps.
What is the best way to handle missing data here.
I am thinking of the following approaches:
My question now is what is the best way to encode this.
At the moment the LSTM network get the input in form of unencoded input vectors:
vector1, vector2,..
The vectors contain the values.
But now when I indroduce the new symbols like:
s1 := <=3 days no measurement taken
s2 := <=7 ..
I would hot encode them.
Is it best to introduce a prefix that destinguises between the two word types?
E.g.
1 vector -> 1, value1, value2
0 vecotr -> 0, 0, 1 (s1)
-> 0, 1, 0 (s2)
Acutally it is not possible encode it either way.