Search code examples
pythonpandasdaterow

List all the dates in a column in a pandas dataframe


I am beginner to Python and Pandas

The following is a part of my pandas dataframe.

enter image description here

I want to extract all the dates in the INTERVAL_DATETIME column.

Like 16/11/2020, 31/05/2020.......21/11/2023. I do not want dates to be duplicated,

I could not find a way to do this.

I would be thankful if someone could help me.


Solution

  • Use:

    df = pd.DataFrame( {'INTERVAL_DATETIME': ['6/11/2020 16:30', 
                                              '6/11/2020 17:00', 
                                              '31/05/2020 18:00',
                                              '31/05/2020 18:30',
                                              '21/11/2023 18:00']})
    
    print (df)
      INTERVAL_DATETIME
    0   6/11/2020 16:30
    1   6/11/2020 17:00
    2  31/05/2020 18:00
    3  31/05/2020 18:30
    4  21/11/2023 18:00
    
    L = df.INTERVAL_DATETIME.str.split().str[0].unique().tolist()
    print (L)
    ['6/11/2020', '31/05/2020', '21/11/2023']
    

    Explanation:

    First use Series.str.split by space, so no sep parameter - output are lists:

    print (df.INTERVAL_DATETIME.str.split())
    0     [6/11/2020, 16:30]
    1     [6/11/2020, 17:00]
    2    [31/05/2020, 18:00]
    3    [31/05/2020, 18:30]
    4    [21/11/2023, 18:00]
    Name: INTERVAL_DATETIME, dtype: object
    

    Then select first lists by indexing str[0]:

    print (df.INTERVAL_DATETIME.str.split().str[0])
    0     6/11/2020
    1     6/11/2020
    2    31/05/2020
    3    31/05/2020
    4    21/11/2023
    Name: INTERVAL_DATETIME, dtype: object
    

    Get unique values to numpy array by Series.unique:

    print (df.INTERVAL_DATETIME.str.split().str[0].unique())
    ['6/11/2020' '31/05/2020' '21/11/2023']
    

    ... and last convert to list:

    print (df.INTERVAL_DATETIME.str.split().str[0].unique().tolist())
    ['6/11/2020', '31/05/2020', '21/11/2023']
    

    Or use Series.str.extract for get values before first space:

    L = df.INTERVAL_DATETIME.str.extract(r'^(.*)\s+', expand=False).unique().tolist()
    print (L)
    ['6/11/2020', '31/05/2020', '21/11/2023']
    

    If need string dates in format YYYY-MM-DD solution is use to_datetime with Series.dt.strftime:

    L = (pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True)
           .dt.strftime('%Y-%m-%d').unique().tolist())
    print (L)
    ['2020-11-06', '2020-05-31', '2023-11-21']
    

    For dates use Series.dt.date instead strftime:

    L = pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True).dt.date.unique().tolist()
    print (L)
    [datetime.date(2020, 11, 6), datetime.date(2020, 5, 31), datetime.date(2023, 11, 21)]
    

    For completness is using Series.dt.normalize for Timestamps without times (it means 00:00:00):

    L = pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True).dt.normalize().unique().tolist()
    print (L)
    
    [Timestamp('2020-11-06 00:00:00'), 
     Timestamp('2020-05-31 00:00:00'), 
     Timestamp('2023-11-21 00:00:00')]