Search code examples
pythonpandaspython-datetimepython-object

Changing data types on Pandas DataFrame uploaded from CSV - mainly Object to Datetime


I am working on a data frame uploaded from CSV, I have tried changing the data typed on the CSV file and to save it but it doesn't let me save it for some reason, and therefore when I upload it to Pandas the date and time columns appear as object. I have tried a few ways to transform them to datetime but without a lot of success:

1) df['COLUMN'] = pd.to_datetime(df['COLUMN'].str.strip(), format='%m/%d/%Y') gives me the error:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

2) Defining dtypes at the beginning and then using it in the read_csv command - gave me an error as well since it does not accept datetime but only string/int.

Some of the columns I want to have a datetime format of date, such as: 2019/1/1, and some of time: 20:00:00

Do you know of an effective way of transforming those datatype object columns to either date or time?


Solution

  • Based on the discussion, I downloaded the data set from the link you provided and read it through pandas. I took one column and a part of it; which has the date and used the pandas data-time module as you did. By doing so I can use the script you mentioned.

    #import necessary library
    import numpy as np
    import pandas as pd
    
    #load the data into csv
    data = pd.read_csv("NYPD_Complaint_Data_Historic.csv")
    
    #take one column which contains the datatime as an example
    dte = data['CMPLNT_FR_DT']
    
    # =============================================================================
    # I will try to take a part of the data from dte which contains the 
    # date time and convert it to date time
    # =============================================================================
    
    from pandas import datetime
    test_data  = dte[0:10]
    df1 = pd.DataFrame(test_data)
    df1['new_col'] = pd.to_datetime(df1['CMPLNT_FR_DT'])
    df1['year'] = [i.year for i in df1['new_col']]
    df1['month'] = [i.month for i in df1['new_col']]
    df1['day'] = [i.day for i in df1['new_col']]
    
    
    #The way you used to convert the data also works
    
    
    df1['COLUMN'] =  pd.to_datetime(df1['CMPLNT_FR_DT'].str.strip(), format='%m/%d/%Y')
    

    It might be the way you get the data. You can see the output from this attached. As the result can be stored in dataframe it won't be a problem to save in any format. Please let me know if I understood correctly and it helped you. The month is not shown in the image, but you can get it.enter image description here