Search code examples
rstrsplit

Splitting character string in R - Extracting the timestamp


Thank you in advance for any feedback.

I am attempting to clean some data in R where a time stamp and a text string are included together in the same cell. I am not getting the expected result. I know the regex needs validation work, but just testing out this particular function

Expected:

"04/05/2018 17:14:35" " -(Additional comments) update"

Actual:

"04/05/2018 17:14:35 -(Additional comments) update"

What I tried:

string <- "04/05/2018 17:14:35 -(Additional comments) update"

pattern <- "[:digit:][:digit:][:punct:] 
            [:digit:][:digit:][:punct:]
            [:digit:][:digit:][:digit:][:digit:]
            [[:space:]]
            [:digit:][:digit:]
            [:punct:]
            [:digit:][:digit:]
            [:punct:]
            [:digit:][:digit:]"

strsplit(string, pattern)

I also tried this variation, same result

pattern <- "[:digit:][:digit:]\\/  
            [:digit:][:digit:]\\/
            [:digit:][:digit:][:digit:][:digit:]             
            [[:space:]]
            [:digit:][:digit:]
             \\:
             [:digit:][:digit:]
             \\:
             [:digit:][:digit:]"

Solution

  • You can try :

    string <- "04/05/2018 17:14:35 -(Additional comments) update"
    
    
    gsub("(\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}:\\d{2}).*","\\1", string)
    #[1] "04/05/2018 17:14:35"
    
    #RHS part
    gsub("(\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}:\\d{2})(.*)","\\2", string)
    #" -(Additional comments) update"
    

    Regex explanation:

    • \\d{2} - 2 digits
    • \\d{4} - 4 digits
    • / - separator
    • : - separator
    • () - Group for selection
    • .* - Followed by anything

    Seems OP is very keen on using strsplit. One option could be as:

    strsplit(gsub("(\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}:\\d{2})(.*)",
           paste("\\1","####","\\2",sep=""), string), split = "####")
    # [[1]]
    # [1] "04/05/2018 17:14:35"            " -(Additional comments) update"