Search code examples
pythonpandasdatetimedataframestrptime

Format causing issues when converting to datetime (data_string[found.end():]))


I am loading lots of csv files, which I want to plot, that have column titles representing date and time.

For example:

14/01/2015 14:27    14/01/2015 14:27
29.97299    30.05902
30.00391    30.09555

For some reason, different files get these dates and times loaded in different formats and I am running into trouble when converting them.

My current code:

for n, f in enumerate(files):
    df = pd.read_csv(filePath+f, delimiter=',',index_col=0)
    times = []
    print df.columns.values[1]
    for i, t in enumerate(df.columns.values):
        if t[2]=='/':
            time = datetime.strptime(t, '%d/%m/%Y %H:%M')
        elif t[4]=='-':
            time = datetime.strptime(t, '%Y-%m-%d %H:%M:%S')
        else:
            print "Is it a date? ", t
        times.append(time)
    timelists.append(times)
    fig = plt.figure()
    df.plot()
    plt.savefig(figdir+(n+1).__str__()+"_"+f+".png", bbox_inches='tight',dpi=300)
    print "Fig", n+1
    plt.close(n)

Produces this:

2015-01-14 10:50:19
Fig 1
2015-01-14 14:01:15
Fig 2
2015-01-14 14:13:08
Fig 3
2015-01-14 14:27:53
Fig 4
2015-01-14 14:40:00
Fig 5
15/01/2015 13:03
Traceback (most recent call last):

Followed by an error (with traceback):

  File "D:/data/scripts/myscript.py", line 29, in <module>
    time = datetime.strptime(t, '%d/%m/%Y %H:%M')

  File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\_strptime.py", line 335, in _strptime
    data_string[found.end():])

ValueError: unconverted data remains: .1

I don't understand why I am getting this error, the last date that is printed is in the format I specify, is it not? and what does the data_string[found.end():]) mean?


Solution

  • That error message means strptime is converting a string that has more information in it than specified in the format, such as seconds or microseconds. For example I get the same error if I try to push '14/01/2015 14:27:00.000' through strptime with the format %d/%m/%Y %H:%M. To make this work in my example I need to use the following format with strptime - %d/%m/%Y %H:%M:%S.%f.

    Not sure exactly what your files are like but the pandas library has very good date/time comprehension. I am not sure if I have exactly had your problem before but in my experience it usually guesses the format correctly without too much (if any) fussing.

    EDIT: Actually if you read the file snippet you posted with pandas it will make the dates the column names and it will append .1 to the second column because the column names should be unique. It seems like that .1 is exactly the part of the date\time string that is not captured by the format specified in the call to strptime. So maybe you are already using pandas at some point in your processing. Please note that pandas will not make the column names datetime objects by default, it will assume they are strings.