Search code examples
python-3.xpandastimestamptime-seriescode-splitting

Splitting datetime value out of text string with uneven length


System: WIN10

IDE: MS Visual Studio COde

Language: Python version 3.7.3

Library: pandas version 1.0.1

Data source: supplied in the example below

Dataset: supplied in the example below

Ask:

I need to split the date and time string out of a column from a data frame that has rows of uneven delimiters i.e. some with three and some with four commas.

I am trying to figure out how to strip the date and time values: 'Nov 11 2013 12:00AM', and 'Apr 11 2013 12:00AM' respectively off the back of these two records in one column into a new column given the second row in the example below has fewer commas.

Code:

df['sample field'].head(2) 

4457-I need, this, date, Nov 11 2013 12:00AM ,
2359-I need this, date, Apr 11 2013 12:00AM ,  

While the below method expands the data into different columns and staggers which column houses the date, this does not work. I need the date and time (or even just date) information in one column so that I can use the date values in further analysis (for example time-series).

Code:

df['sample field'].str.split(",", expand=True)

Solution

  • Data

    df=pd.DataFrame({'Text':['4457-I need, this, date, Nov 11 2013 12:00AM ,','2359-I need this, date, Apr 11 2013 12:00AM ,']})
    df
    

    Use df.extract with a regex epression

    df['Date']= df.Text.str.extract('([A-Za-z]+\s+\d+\s+\d+\s+\d+:[0-9A-Z]+(?=\s+\,+))')
    df
    
    
    
     #df.Date=pd.to_datetime(df.Date).dt.strftime('%b %d %Y %H:%M%p')
    #df['date']  = pd.to_datetime(df['date'] ,format='%b %d %Y %H:%M%p')
        df['Date']=pd.to_datetime(df['Date'])#This or even df['Date']=pd.to_datetime(df['Date'], format=('%b %d %Y %I:%M%p')) could work. Just remmeber because your time is 12AM use 12 clock hour system %I not %H and also hour 00.00 likely to be trncated, If have say11.00AM, the time will appear
    

    enter image description here