I have a column in Pandas dataframe 'Comment Text' which contains the dates in this format(showing here, only first observation):
7/09/2018 11:59:37 AM;12:01:33 PM;00:01:56
Adding Dataframe sample:
df = pd.DataFrame({'Common Text':['7/09/2018 11:59:37 AM;12:01:33 PM;00:01:56', 'Adams Gill Christ 4 Oct 2017 02:52 PM', '4/08/2017 4:30:49 PM ;4:37:23 PM;00:06:34', '5/07/2018 10:14:03 AM ;10:21:35 AM;00:07:31', 'the call was made on 20 Jun 2017\nbut call not found on system', 'Call made on 7/03/2018 8:22:25 AM', 'Review is during 30 May to 1 March 2018']})
But when I did something like this:
import datefinder
FD = datefinder.find_dates(df['Comment Text'][0])
for dates in FD:
print(dates)
I got the following result:
2018-07-09 11:59:37
2019-06-20 12:01:33
2019-06-20 00:01:56
Which is not correct as I was expecting only 2018-07-09
to return as result.
If I understand you correctly and your data always has the two date structures you've shown as example. You can use regex.
# Make example data
df = pd.DataFrame({'Common Text':['7/09/2018 11:59:37 AM;12:01:33 PM;00:01:56',
'Adams Gill Christ 4 Oct 2017 02:52 PM',
'4/08/2017 4:30:49 PM ;4:37:23 PM;00:06:34',
'5/07/2018 10:14:03 AM ;10:21:35 AM;00:07:31',
'the call was made on 20 Jun 2017\nbut call not found on system']})
Common Text
0 7/09/2018 11:59:37 AM;12:01:33 PM;00:01:56
1 Adams Gill Christ 4 Oct 2017 02:52 PM
2 4/08/2017 4:30:49 PM ;4:37:23 PM;00:06:34
3 5/07/2018 10:14:03 AM ;10:21:35 AM;00:07:31
4 the call was made on 20 Jun 2017\nbut call not...
Use str.extract
.
s = df['Common Text'].str.extract('(.+?(?=\s\d{1,2}:\d{2}:\d{2}))|(\d{1,2}\s[A-Za-z]{3}\s\d{4})')
df['Date'] = s[0].fillna(s[1])
Common Text Date
0 7/09/2018 11:59:37 AM;12:01:33 PM;00:01:56 7/09/2018
1 Adams Gill Christ 4 Oct 2017 02:52 PM 4 Oct 2017
2 4/08/2017 4:30:49 PM ;4:37:23 PM;00:06:34 4/08/2017
3 5/07/2018 10:14:03 AM ;10:21:35 AM;00:07:31 5/07/2018
4 the call was made on 20 Jun 2017\nbut call not... 20 Jun 2017
Explanation:
(.+?(?=\s\d{1,2}:\d{2}:\d{2}))
: Extract everything before the pattern of time, which is 99:99:99
(\d{1,2}\s[A-Za-z]{3}\s\d{4})
: Extract the pattern: one or two numbers, space, 3 letters, space, 4 numbers