List all the dates in a column in a pandas dataframe

I am beginner to Python and Pandas

The following is a part of my pandas dataframe.

I want to extract all the dates in the INTERVAL_DATETIME column.

Like 16/11/2020, 31/05/2020.......21/11/2023. I do not want dates to be duplicated,

I could not find a way to do this.

I would be thankful if someone could help me.

Solution

Use:

df = pd.DataFrame( {'INTERVAL_DATETIME': ['6/11/2020 16:30', 
                                          '6/11/2020 17:00', 
                                          '31/05/2020 18:00',
                                          '31/05/2020 18:30',
                                          '21/11/2023 18:00']})

print (df)
  INTERVAL_DATETIME
0   6/11/2020 16:30
1   6/11/2020 17:00
2  31/05/2020 18:00
3  31/05/2020 18:30
4  21/11/2023 18:00

L = df.INTERVAL_DATETIME.str.split().str[0].unique().tolist()
print (L)
['6/11/2020', '31/05/2020', '21/11/2023']

Explanation:

First use Series.str.split by space, so no sep parameter - output are lists:

print (df.INTERVAL_DATETIME.str.split())
0     [6/11/2020, 16:30]
1     [6/11/2020, 17:00]
2    [31/05/2020, 18:00]
3    [31/05/2020, 18:30]
4    [21/11/2023, 18:00]
Name: INTERVAL_DATETIME, dtype: object

Then select first lists by indexing str[0]:

print (df.INTERVAL_DATETIME.str.split().str[0])
0     6/11/2020
1     6/11/2020
2    31/05/2020
3    31/05/2020
4    21/11/2023
Name: INTERVAL_DATETIME, dtype: object

Get unique values to numpy array by Series.unique:

print (df.INTERVAL_DATETIME.str.split().str[0].unique())
['6/11/2020' '31/05/2020' '21/11/2023']

... and last convert to list:

print (df.INTERVAL_DATETIME.str.split().str[0].unique().tolist())
['6/11/2020', '31/05/2020', '21/11/2023']

Or use Series.str.extract for get values before first space:

L = df.INTERVAL_DATETIME.str.extract(r'^(.*)\s+', expand=False).unique().tolist()
print (L)
['6/11/2020', '31/05/2020', '21/11/2023']

If need string dates in format YYYY-MM-DD solution is use to_datetime with Series.dt.strftime:

L = (pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True)
       .dt.strftime('%Y-%m-%d').unique().tolist())
print (L)
['2020-11-06', '2020-05-31', '2023-11-21']

For dates use Series.dt.date instead strftime:

L = pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True).dt.date.unique().tolist()
print (L)
[datetime.date(2020, 11, 6), datetime.date(2020, 5, 31), datetime.date(2023, 11, 21)]

For completness is using Series.dt.normalize for Timestamps without times (it means 00:00:00):

L = pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True).dt.normalize().unique().tolist()
print (L)

[Timestamp('2020-11-06 00:00:00'), 
 Timestamp('2020-05-31 00:00:00'), 
 Timestamp('2023-11-21 00:00:00')]