Search code examples
pythonplotcontinuous

python time series plot problem (discontinuous datetime, plot weird for some files)


Hi I am trying to plot some timeseries data but there are two problems.

Before describing the problems, there are many stations and data files that I use are for each station.

I mean, the files are station1.csv, station2.csv, ... . And each csv file has date, station name, sensor name, elevation, groundwater level etc.

  1. Discontinuous timeseries

The original file has discontinuous timeseries as attached below.

2014-10-24,JDsd1,S11,1.49,26.47,36.84,18.19,7682,1021.57
2014-10-25,JDsd1,S11,1.49,26.47,36.84,18.19,7995,1021.79
2014-10-26,JDsd1,S11,1.52,26.44,36.87,18.2,7985,1019.75
2014-10-27,JDsd1,S11,1.53,26.43,36.88,18.2,7979,1020.13
2014-10-28,JDsd1,S11,,,,,,
2014-11-13,JDsd1,S11,1.33,26.63,36.67,18.08,13160,1026.25
2014-11-14,JDsd1,S11,1.24,26.72,36.58,18.11,13013,1027.09
2014-11-15,JDsd1,S11,1.23,26.73,36.57,18.12,12912,1030.27
2014-11-16,JDsd1,S11,1.22,26.74,36.56,18.13,12853,1026.32

I need to make the date range continuously, but hard to do it.

When I use pd.date_range(start_date (or min), end_date( or max), freq='d', the result shows ValueError: Length of values (775) does not match length of index (769).

The length of values (775) is that I need to make and the length of index (769) is current length of dates.

  1. About plot shape This is a atmospheric data plot in a station data file. enter image description here

However, there are some stations which show weird plots of atmospheric data as below.

enter image description here

I used same code and data have same data structure. I cannot see any difference in data.(I want to upload the data but the length would be too long..)

If you know some solutions or hints, please let me know.


Solution

  • I solved the first problem.

    n=4
    f_n = glob.glob('%s%s.csv' % (path_dir, gs['station'][n])) #get file
    pp=pd.read_csv(f_n[0]) #read file
    pp=pp.set_index(pd.to_datetime(pp['Date'])) #change rangeindex to datetime
    pp=pp.resample('D').first() #Make continuous timeseries
    

    Then I need the solution for the second problem..

    If you know the solution or hint, please let me know