System: WIN10
IDE: MS Visual Studio COde
Language: Python version 3.7.3
Library: pandas version 1.0.1
Data source: supplied in the example below
Dataset: supplied in the example below
Ask:
I need to split the date and time string out of a column from a data frame that has rows of uneven delimiters i.e. some with three and some with four commas.
I am trying to figure out how to strip the date and time values: 'Nov 11 2013 12:00AM'
, and 'Apr 11 2013 12:00AM'
respectively off the back of these two records in one column into a new column given the second row in the example below has fewer commas.
Code:
df['sample field'].head(2)
4457-I need, this, date, Nov 11 2013 12:00AM ,
2359-I need this, date, Apr 11 2013 12:00AM ,
While the below method expands the data into different columns and staggers which column houses the date, this does not work. I need the date and time (or even just date) information in one column so that I can use the date values in further analysis (for example time-series).
Code:
df['sample field'].str.split(",", expand=True)
Data
df=pd.DataFrame({'Text':['4457-I need, this, date, Nov 11 2013 12:00AM ,','2359-I need this, date, Apr 11 2013 12:00AM ,']})
df
Use df.extract with a regex epression
df['Date']= df.Text.str.extract('([A-Za-z]+\s+\d+\s+\d+\s+\d+:[0-9A-Z]+(?=\s+\,+))')
df
#df.Date=pd.to_datetime(df.Date).dt.strftime('%b %d %Y %H:%M%p')
#df['date'] = pd.to_datetime(df['date'] ,format='%b %d %Y %H:%M%p')
df['Date']=pd.to_datetime(df['Date'])#This or even df['Date']=pd.to_datetime(df['Date'], format=('%b %d %Y %I:%M%p')) could work. Just remmeber because your time is 12AM use 12 clock hour system %I not %H and also hour 00.00 likely to be trncated, If have say11.00AM, the time will appear