Search code examples
pythonpandasread-csv

Pandas read_csv is not parsing date and time


I have a csv file which looks like this

Study ID,CG_Arrival_Date/Time,Arrival_Date,Arrival_Time
2,1/1/2011 0:03,1/1/2011,0:03:00
3,1/1/2011 0:53,1/1/2011,0:53:00

I wanted to parse arrival date and arrival time, so I worte this on Google Colab.

df = pd.read_csv('/content/SM_AI_Data.csv', index_col=['Study ID'],parse_dates=[['Arrival Date','Arrival Time']])
df.head()

The columns get combined fine, but its data type is still object.

How can I change the code so that it works?

Please see the details above.


Solution

  • I can't reproduce any problem if I specify the columns correctly. I use the indexes here because I'm too lazy to use the names :

    import pandas as pd
    from io import StringIO
    csv_text="""
    Study ID,Arrival_Date/Time,Arrival_Date,Arrival_Time 
    2,1/1/2011 0:03,1/1/2011,0:03:00
    3,1/1/2011 0:53,1/1/2011,0:53:00"""
    
    df = pd.read_csv(StringIO(csv_text), index_col=['Study ID'],parse_dates=[1,2])
    df.dtypes
    

    This produces

    Arrival_Date/Time    datetime64[ns]
    Arrival_Date         datetime64[ns]
    Arrival_Time                 object
    

    The contents are:

        Arrival_Date/Time   Arrival_Date    Arrival_Time
    Study ID            
    2   2011-01-01 00:03:00 2011-01-01  0:03:00
    3   2011-01-01 00:53:00 2011-01-01  0:53:00
    

    The Date and Time columns can be combined too :

    import pandas as pd
    from io import StringIO
    csv_text="""
    Study ID,Arrival_Date/Time,Arrival_Date,Arrival_Time 
    2,1/1/2011 0:03,1/1/2011,0:03:00
    3,1/1/2011 0:53,1/1/2011,0:53:00"""
    
    df = pd.read_csv(StringIO(csv_text), index_col=['Study ID'],parse_dates=[[2,3],1])
    df.dtypes
    

    The contents are

        Arrival_Date_Arrival_Time   Arrival_Date/Time
    Study ID        
    2   2011-01-01 00:03:00 2011-01-01 00:03:00
    3   2011-01-01 00:53:00 2011-01-01 00:53:00
    

    It's not possible to tell what the format is from the values. DD/MM/YYYY or MM/DD/YYYY? The correct format can be specified by setting the dayfirst parameter to True or False`