Search code examples
pythonpandasmatplotlibdataframeaxes

Using dataframe date column in matplotlib


I have a DataFrame with date as a column formatted as ddmmyy (e.g., 08JUN14). I can’t figure out how to format it for use as the x-axis of a matplotlib plot. From experimenting I understand that I need a string since:

datetime.strptime(“01Jul76”, ,"%d%b%y") 
datetime.datetime(1976, 7, 1, 0, 0)

What I’m not understanding is how to format/use the entire DataFrame column? I tried converting the entire column to a string but that obviously isn’t correct (which I think makes sense after seeing the error message).

s = str(df.date)
d = datetime.strptime(s,"%d%b%y")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\_strptime.py", line 325, in _strptime
    (data_string, format))
ValueError: time data "('01Jul76', '01Sep76', … '15Jan15', '19Mar15')" does not match format '%d%b%y'.

I have searched and seen references to this issue but I don’t seem to be getting anywhere. Any guidance is greatly appreciated.


Solution

  • It looks like you're trying to convert some strings into datetime objects but you can't pass a pandas Series to datetime.strptime as this raises an error:

    In [2]:
    
    df = pd.DataFrame({'date':['01Jul76', '01Sep76', '15Jan15', '19Mar15']})
    df
    Out[2]:
          date
    0  01Jul76
    1  01Sep76
    2  15Jan15
    3  19Mar15
    In [4]:
    
    import datetime as dt
    dt.datetime.strptime(str(df['date']),"%d%b%y")
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-4-d1c7711603e3> in <module>()
          1 import datetime as dt
    ----> 2 dt.datetime.strptime(str(df['date']),"%d%b%y")
    
    C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
        498     """Return a class cls instance based on the input string and the
        499     format string."""
    --> 500     tt, fraction = _strptime(data_string, format)
        501     tzname, gmtoff = tt[-2:]
        502     args = tt[:6] + (fraction,)
    
    C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\_strptime.py in _strptime(data_string, format)
        335     if not found:
        336         raise ValueError("time data %r does not match format %r" %
    --> 337                          (data_string, format))
        338     if len(data_string) != found.end():
        339         raise ValueError("unconverted data remains: %s" %
    
    ValueError: time data '0    01Jul76\n1    01Sep76\n2    15Jan15\n3    19Mar15\nName: date, dtype: object' does not match format '%d%b%y'
    

    The easiest thing is to use to_datetime and pass your format string:

    In [7]:
    
    df['date'] = pd.to_datetime(df['date'], format='%d%b%y')
    df.info()
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 4 entries, 0 to 3
    Data columns (total 1 columns):
    date    4 non-null datetime64[ns]
    dtypes: datetime64[ns](1)
    memory usage: 64.0 bytes
    In [8]:
    
    df
    Out[8]:
            date
    0 1976-07-01
    1 1976-09-01
    2 2015-01-15
    3 2015-03-19