Thank you in advance for any feedback.
I am attempting to clean some data in R where a time stamp and a text string are included together in the same cell. I am not getting the expected result. I know the regex needs validation work, but just testing out this particular function
Expected:
"04/05/2018 17:14:35" " -(Additional comments) update"
Actual:
"04/05/2018 17:14:35 -(Additional comments) update"
What I tried:
string <- "04/05/2018 17:14:35 -(Additional comments) update"
pattern <- "[:digit:][:digit:][:punct:]
[:digit:][:digit:][:punct:]
[:digit:][:digit:][:digit:][:digit:]
[[:space:]]
[:digit:][:digit:]
[:punct:]
[:digit:][:digit:]
[:punct:]
[:digit:][:digit:]"
strsplit(string, pattern)
I also tried this variation, same result
pattern <- "[:digit:][:digit:]\\/
[:digit:][:digit:]\\/
[:digit:][:digit:][:digit:][:digit:]
[[:space:]]
[:digit:][:digit:]
\\:
[:digit:][:digit:]
\\:
[:digit:][:digit:]"
You can try :
string <- "04/05/2018 17:14:35 -(Additional comments) update"
gsub("(\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}:\\d{2}).*","\\1", string)
#[1] "04/05/2018 17:14:35"
#RHS part
gsub("(\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}:\\d{2})(.*)","\\2", string)
#" -(Additional comments) update"
Regex explanation:
\\d{2}
- 2 digits\\d{4}
- 4 digits/
- separator :
- separator()
- Group for selection.*
- Followed by anything Seems OP is very keen on using strsplit
. One option could be as:
strsplit(gsub("(\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}:\\d{2})(.*)",
paste("\\1","####","\\2",sep=""), string), split = "####")
# [[1]]
# [1] "04/05/2018 17:14:35" " -(Additional comments) update"