Search code examples
pandasdatetimepivot-tablecategorical-datahour

How to categorize a range of hours in Pandas?


In my project I am trying to create a new column to categorize records by range of hours, let me explain, I have a column in the dataframe called 'TowedTime' with time series data, I want another column to categorize by full hour without minutes, for example if the value in the 'TowedTime' column is 09:32:10 I want it to be categorized as 9 AM, if says 12:45:10 it should be categorized as 12 PM and so on with all the other values. I've read about the .cut and bins function but I can't get the result I want. DF 'TowedTime values'


        import numpy as np 
        import pandas as pd 
        import matplotlib.pyplot as plt
        import seaborn as sns
        import datetime
        
        df = pd.read_excel("Baltimore Towing Division.xlsx",sheet_name="TowingData")
        
        df['Month'] = pd.DatetimeIndex(df['TowedDate']).strftime("%b")
        df['Week day'] = pd.DatetimeIndex(df['TowedDate']).strftime("%a")
    
        monthOrder = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
        dayOrder = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
        
        pivotHours = pd.pivot_table(df, values='TowedDate',index='TowedTime',
                                columns='Week day',
                                fill_value=0,
                                aggfunc= 'count', 
                                margins = False, margins_name='Total').reindex(dayOrder,axis=1)
        
        print(pivotHours)


Solution

  • With the help of @Fabien C I was able to solve the problem.

    First, I had to check the data type of values in the 'TowedTime' column with dtypes function. I found that were a Object.

    I proceed to try convert 'TowedTime' to datetime:

    df['TowedTime'] = pd.to_datetime(df['TowedTime'],format='%H:%M:%S').dt.time
    

    Then to create a new column in the df, for only the hours:

    df['Hour'] = pd.to_datetime(df['TowedTime'],format='%H:%M:%S').dt.hour
    

    And the result was this: enter image description here

    You can notice in the image that 'TowedTime' column remains as an object, but the new 'Hour' column correctly returns the hour value.

    Originally, the dataset already had the date and time separated into different columns, I think they used some method to separate date and time in excel and this created the time ('TowedTime') to be an object, I could not convert it, Or at least that's what the dtypes function shows me.

    I tried all this Pandas methods for converting the Object to Datetime :

    df['TowedTime'] = pd.to_datetime(df['TowedTime'])
    
    df['TowedTime'] = pd.to_datetime(df['TowedTime'])
    
    df['TowedTime'] = df['TowedTime'].astype('datetime64[ns]')
    
    df['TowedTime'] =  pd.to_datetime(df['TowedTime'], format='%H:%M:%S')
    
    df['TowedTime'] = pd.to_datetime(df['TowedTime'], format='%H:%M:%S')