Search code examples
pythonmatplotlibgraph-visualization

Plotting Time vs Date in matplotlib


I have a .csv file with only two columns in it, date and time:

    04-02-15,11:15
    04-03-15,09:35
    04-04-15,09:10
    04-05-15,18:05
    04-06-15,10:30
    04-07-15,09:20

I need this data to be plotted (preferably in an area graph, haven't gotten that far yet) using matplotlib. I need the y-axis to be time, and the x-axis to be date. I'm having trouble wrapping my head around some of the usage for time/date, and was hoping someone could take a look at my code and offer some guidance:

import numpy as np
from pylab import *
import matplotlib.pyplot as plt
import datetime as DT

data= np.loadtxt('daily_count.csv', delimiter=',',
         dtype={'names': ('date', 'time'),'formats': ('S10', 'S10')} )

x = [DT.datetime.strptime(key,"%m-%d-%y") for (key, value) in data ]
y = [DT.datetime.strptime(key,"%h:%m") for (key, value) in data]

fig = plt.figure()
ax = fig.add_subplot(111)
ax.grid()


fig.autofmt_xdate()
fig.autofmt_ytime()
plt.plot(x,y)
plt.xlabel('Date')
plt.ylabel('Time')
plt.title('Peak Time')
plt.show()

Each time I try to run it, I get this error:

ValueError: time data '04-02-15' does not match format '%h:%m'

I've also got a suspicion about the ticks for the y-axis, which thus far don't seem to be established. I'm very open to suggestions for the rest of this code as well - thanks in advance, internet heroes!


Solution

  • So the traceback tells you the problem. It is trying to parse your date as your time, and this is a result of the way you parsed the data in these lines:

    data= np.loadtxt('daily_count.csv', delimiter=',',
             dtype={'names': ('date', 'time'),'formats': ('S10', 'S10')} )
    
    x = [DT.datetime.strptime(key,"%m-%d-%y") for (key, value) in data ]
    y = [DT.datetime.strptime(key,"%h:%m") for (key, value) in data]
    

    There are multiple solutions, but the root of the 'problem; is that when you use loadtxt and define the names and dtypes, it gives you back a list of tuples, i.e.,

    [('04-02-15', '11:15') ('04-03-15', '09:35') ('04-04-15', '09:10')
    ('04-05-15', '18:05') ('04-06-15', '10:30') ('04-07-15', '09:20')]
    

    So when you looped over it, you actually were accessing constantly the dates:

    >>> print [key for (key, value) in data]
    >>> ['04-02-15', '04-03-15', '04-04-15', '04-05-15', '04-06-15', '04-07-15']
    

    So you were trying to turn '04-02-15' into the format '%h:%m', which of course will not work.

    To get to the point, you can unconfuse the parsed data using the zip function. For example,

    print map(list, zip(*data))
    ['04-02-15', '04-03-15', '04-04-15', '04-05-15', '04-06-15', '04-07-15']
    ['11:15', '09:35', '09:10', '18:05', '10:30', '09:20']
    

    Also, you need to check the formats for the dates you passed, for example "%h:%m" won't work as %h doesn't exist, and %m means month. You can find a nice summary on the docs, or here: http://strftime.org/.

    Or to get to the point:

    import numpy as np
    from pylab import *
    import matplotlib.pyplot as plt
    import datetime as DT
    
    data= np.loadtxt('daily_count.csv', delimiter=',',
             dtype={'names': ('date', 'time'),'formats': ('S10', 'S10')} )
    
    dates, times = map(list, zip(*data))
    print dates, times
    
    x = [DT.datetime.strptime(date,"%m-%d-%y") for date in dates]
    y = [DT.datetime.strptime(time,"%H:%M") for time in times]
    
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.grid()
    
    plt.plot(x,y)
    plt.xlabel('Date')
    plt.ylabel('Time')
    plt.title('Peak Time')
    plt.show()
    

    which gives the following plot: enter image description here