Hi I am trying to plot some timeseries data but there are two problems.
Before describing the problems, there are many stations and data files that I use are for each station.
I mean, the files are station1.csv, station2.csv, ... . And each csv file has date, station name, sensor name, elevation, groundwater level etc.
The original file has discontinuous timeseries as attached below.
2014-10-24,JDsd1,S11,1.49,26.47,36.84,18.19,7682,1021.57
2014-10-25,JDsd1,S11,1.49,26.47,36.84,18.19,7995,1021.79
2014-10-26,JDsd1,S11,1.52,26.44,36.87,18.2,7985,1019.75
2014-10-27,JDsd1,S11,1.53,26.43,36.88,18.2,7979,1020.13
2014-10-28,JDsd1,S11,,,,,,
2014-11-13,JDsd1,S11,1.33,26.63,36.67,18.08,13160,1026.25
2014-11-14,JDsd1,S11,1.24,26.72,36.58,18.11,13013,1027.09
2014-11-15,JDsd1,S11,1.23,26.73,36.57,18.12,12912,1030.27
2014-11-16,JDsd1,S11,1.22,26.74,36.56,18.13,12853,1026.32
I need to make the date range continuously, but hard to do it.
When I use pd.date_range(start_date (or min), end_date( or max), freq='d'
, the result shows ValueError: Length of values (775) does not match length of index (769)
.
The length of values (775) is that I need to make and the length of index (769) is current length of dates.
However, there are some stations which show weird plots of atmospheric data as below.
I used same code and data have same data structure. I cannot see any difference in data.(I want to upload the data but the length would be too long..)
If you know some solutions or hints, please let me know.
I solved the first problem.
n=4
f_n = glob.glob('%s%s.csv' % (path_dir, gs['station'][n])) #get file
pp=pd.read_csv(f_n[0]) #read file
pp=pp.set_index(pd.to_datetime(pp['Date'])) #change rangeindex to datetime
pp=pp.resample('D').first() #Make continuous timeseries
Then I need the solution for the second problem..
If you know the solution or hint, please let me know