Search code examples
regexrsubstringsubstr

Extract parts of a string in R


I have a string of the form

stamp = "section_d1_2010-07-01_08_00.txt"

and would like to be able to extract parts of this. I have been able to do this by using repeated str_extract to get to the section I want, e.g. to grab the month

month = str_extract(stamp,"2010.+")
month = str_extract(month,"-..")
month = str_extract(month,"..$")

however this is terribly inefficient and there has to be a better way. For this particular example I can use

month = substr(stamp,17,18)

however am looking for something more versatile (in case the number of digits changes).

I think I need the regular expression to grab what comes AFTER certain flags (the _ or -, or the 3rd _ etc.). I have tried using sub as well, but had the same problem in that I was needing several to hone into what I actually wanted.

An example of how to get say the month (07 here) and the hour (08 here) would be appreciated.


Solution

  • You can simply use strsplit with regex [-_] and perl=TRUE option to get all the parts.

    stamp <- "section_d1_2010-07-01_08_00.txt"
    strsplit(stamp, '[-_]')[[1]]
    # [1] "section" "d1"      "2010"    "07"      "01"      "08"      "00.txt" 
    

    See demo.

    https://regex101.com/r/cK4iV0/8