Search code examples
pythonpandasmachine-learningpython-datetime

Error: invalid unit abbreviation: / , while trying to convert date with a format of 10/2/2012 9:00:00 AM


I am using pandas to convert a column having date and time into seconds by using the following code:

df['date_time'] = pd.to_timedelta(df['date_time'])
df['date_time'] = df['date_time'].dt.total_seconds() 

The dataset is: enter image description here

If i use the following code:

df['date_time'] = pd.to_datetime(df['date_time'], errors='coerce')
df['date_time'] = df['date_time'].dt.total_seconds()
print(df.head())

Then i get the following error:

AttributeError: 'DatetimeProperties' object has no attribute 'total_seconds'

So as the case with dt.timestamp

So my queries are:

  1. Is it necessary to convert the time to seconds for training the model? If yes then how and if not then why?

  2. This one is related to two other columns named weather_m and weather_d, weather_m has 38 different types of entries or we say 38 different categories out of which only one will be true at a time and weather_m has 11 but the case is same as with weather_m. So i am confused a bit here whether to split this categorical data and merge 49 new columns in the original dataset and dropping weather_m and weather_d to train the model or use LabelEncoder instead of pd.get_dummies?


Solution

    1. Converting a datetime or timestamp into a timedelta (duration) doesn't make sense. It'd only make sense if you want the duration between the given timestamp, and some other reference date. Then you can get the timedelta just by using - to get the difference between 2 dates. Since your datetime column is a string you also need to convert it to a datetime first: df['date_time'] = pd.to_datetime(df['date_time'], format='%m/%d/%Y %H:%M'). Then you can try something like: ref_date = datetime.datetime(1970, 1, 1, 0, 0); df['secs_since_epoch'] = (df['date_time'] - ref_date).dt.total_seconds()

    2. If the different categories are totally distinct from each other (and they don't e.g. have an implicit ordering to them) then you should use one hot encoding yes, replacing the original columns. Since the number of categories is small that should be fine. (though it also depends what exactly you're gonna run on this data. some libraries might be ok with the original categorical column, and do the conversion implicitly for you)