I am trying to extract only dates from a column(Result)
of a dataframe
. Dates will only start from year 2000 and beyond but the format of date could be any including datetime.
What I want is just date.
Is there a simple way of doing it with some easy Regex codes ?
Example of dataset:
date_extract_df = pd.DataFrame({
'Result':[': XYZ',': 39 YRS/M',': Self',': HOME COLLECTION',': 10593974',
': 012408030006',': 03/08/2024',': 03/Aug/2024 11:50 AM',': 03/Aug/2024 03:24 PM',
' ','31.80','15'],
'Unit':['dfd','dfdfd','tytyt','03/08/2024','fgf','tyt','xcx','ere','sds','03/Aug/2024 03:24 PM',
'4545','5656']
})
Expected Result:
0
1
2
3
4
5
6 03/08/2024
7 03/Aug/2024
8 03/Aug/2024
9 03/Aug/2024
10
11
I am not good at Regex and have tried below code:
date_extract_df.Result.str.extract(r"^[0,1]?\d{1}\/(([0-2]?\d{1})|([3][0,1]{1}))\/(([1]{1}[9]{1}[9]{1}\d{1})|([2-9]{1}\d{3}))$")
Is there a way to figure out Rows containing dates in the column and then filter that row to extract date ?
I was trying this for a similar approach:
datetime.datetime.isoformat(date_check['Result'][9])
date_check['Result'].apply(lambda x: datetime.datetime.isoformat(x))
Minus the epoch time (line 6), this should do the task
pd.to_datetime(date_extract_df["Result"].str.replace(": ",""), errors="coerce")
To go deeper, you can use third-parties tools like dateparser
to cover more cases