pandas statistics time-series statsmodels kaggle

How to solve duplicate date values in Time Series Analysis?

I have a dataframe with multiple Date values

for my Time Series Analysis. I suppose they took the values at different times of the day, and just wrote it as the date.

So, I am thinking of generating random times for the values, like the first 9/9/2016 value would be at 9pm, the second at 3pm, the third at 9am, the fourth at 3am (Since the data is getting older and older).

What is the best practice?

Solution

There is no way (or it is highly unlikely) that actual good data would have values like this. You mentioned that this is from a Kaggle competition - I doubt they would leave these kind of things to any ambiguity.

What I'm thinking is that you didn't read the dataset carefully. Maybe it's the same date but for different variables? For example, maybe they measured your values on the same date, but in different areas?

You may want to check your other columns before jumping to conclusions.