Search code examples
datedatetimetimepython-datetime

Reading and Graphing Dates from a CSV in Python


I have a CSV where the first column is a day and a time and the remaining columns are measurements taken at those times. The dates are in the form "MM/DD/YYYY hh:mm:ss" in a single column. Python reads this as a string. How to I convert this to a date and time? Python doesn't know that '09/14/2016 23:00:00' comes right before '09/15/2016 0:00:00'

In MatLab, I can use

time = datenum(filename{:,1})

datetick('x','mmm-dd HH:MM:SS')

but I'm not sure what to use in Python. Is there a similar command in this language? I have tried using datetime, but I haven't much luck.

Thanks!

Note: I'm using Pandas to read the CSV


Solution

  • You will have to parse the dates (timestamps) from string to datetime dtype to have them sorted appropriately. If you use pandas.read_csv to load your csv to a DataFrame, the easiest thing to do is use the parse_dates keyword. Ex:

    from  io import StringIO
    import pandas as pd
    
    s="""Timestamp,Value
    06/01/2020 17:05:00,9506.01
    06/01/2020 17:10:00,9513.44
    06/01/2020 17:15:00,9521.56"""
    
    df = pd.read_csv(StringIO(s), parse_dates=["Timestamp"])
    
    for c in df.columns:
        print(f"{c} - {df[c].dtype}")
    # Timestamp - datetime64[ns]
    # Value - float64
    

    Another option would be to convert from string to datetime after import using pandas.to_datetime:

    df = pd.read_csv(StringIO(s))
    df["Timestamp"].dtype
    # dtype('O') # after import we only have strings...
    df["Timestamp"] = pd.to_datetime(df["Timestamp"])
    
    for c in df.columns:
        print(f"{c} - {df[c].dtype}")
    # Timestamp - datetime64[ns]
    # Value - float64