Search code examples
rdateextractfilenames

Extract date from the name of the CSV file


How can I extract 20151001 as date (like 2015-10-01) in a new vector such that the new vector is:

  File Name Date
  Residential_20151001_0000_1.csv 2015-10-01

Solution

  • We can use sub, match one or more characters that are not a _ followed by -, capture the numeric part ((\\d+)) followed by characters until the end of the string. In the replacement, we use the backreference (\\1). After the string got extracted, we can convert to Date class with as.Date specifying the format.

    as.Date(sub('[^_]+_(\\d+).*', '\\1', df1[,1]), "%Y%m%d")
    #[1] "2015-10-01"
    

    A compact option would be using str_extract with ymd

    library(stringr)
    library(lubridate)
    ymd(str_extract(df1[,1], '\\d+'))
    #[1] "2015-10-01 UTC"
    

    Update

    If we need to extract the time,

    t1 <- sub('^[^_]+_[^_]+_(\\d{2})(\\d{2})_.*', '\\1:\\2', df1[,1])
    t1
    #[1] "00:00"
    strptime(t1, format='%H:%M')