Search code examples
pythonpandasdataframepreprocessor

Apply the same function on each record of a column in Pandas dataframe


I have a dataset with a date-time column with a specific format. I need to create new features out of this column that means I need to add new columns to the dataframe by extracting information from the above-mentioned date-time column. My sample input dataframe column is like below.

id    datetime         feature2
1    12/3/2020 0:56       1
2    11/25/2020 13:26     0

The expected output is:

id    date      hour    mints    feature2
1    12/3/2020   0       56         1
2    11/25/2020  13      26         0

Pandas apply() method may not work for this as new columns are added. What is the best way to do this?

Is there any way which I can apply a single function on each record of the column to do this by applying on the whole column?


Solution

  • pandas series .dt accessor

    • Your datetime data is coming from a pandas column (series), so use the .dt accessor
    import pandas as pd
    
    df = pd.DataFrame({'id': [1, 2],
                       'datetime': ['12/3/2020 0:56', '11/25/2020 13:26'],
                       'feature2': [1, 0]})
    df['datetime'] = pd.to_datetime(df['datetime'])
    
     id            datetime  feature2
      1 2020-12-03 00:56:00         1
      2 2020-11-25 13:26:00         0
    
    # create columns
    df['hour'] = df['datetime'].dt.hour
    df['min'] = df['datetime'].dt.minute
    df['date'] = df['datetime'].dt.date
    
    # final
     id            datetime  feature2  hour  min        date
      1 2020-12-03 00:56:00         1     0   56  2020-12-03
      2 2020-11-25 13:26:00         0    13   26  2020-11-25