Search code examples
rregexdategsubstringr

Extracting date from file name and making it a variable


I have a list of files with names such as "MERRA2_300.tavg1_2d_flx_Nx.20050101.SUB.nc" and I need to combine all of these files using a loop, and create a variable in that new combined dataset which describes each observation with the date that corresponds to its file of origin. All of the files have that exact same name, except for the date itself (ie. the next file is MERRA2_300.tavg1_2d_flx_Nx.20050102.SUB.nc)

I have written a loop as follows

wi <- list.files(path = ".")

final_data <- data.frame(matrix(ncol = 7, nrow = 0)) 
colnames(final_data) <- c("PRECTOTCORR", "TLML", "lat", "lon", "time", "time_bnds", "date") 

for (i in wi) {
  nc<-open.nc(i)
  dat<-read.nc(nc) 

  date <- i

  dat$date <- date

  final_data <- rbind(final_data, dat)
}

The line date <- i is the crux of this question. I know that using gsub or stringr or regx there is some kind of way to create a date variable for each observation, but I am confused by the operation and the syntax.

Ideally, an answer would create the variable such that it is understood as time series data by R, but that isnt absolutely necessary. Even if the variable is just created as a string, I can go from there to time series data on my own I think.


Solution

  • As a bit of an answer to my own question, the gsub command does well to create a new variable from part of the file name.

    wi1 <- "MERRA2_300.tavg1_2d_flx_Nx.20050101.SUB.nc"
    nc1 <- open.nc(wi1)
    dat1 <- read.nc(nc1)
    dat1$date <- gsub("MERRA2_300.tavg1_2d_flx_Nx.|.SUB.nc", "", "MERRA2_300.tavg1_2d_flx_Nx.20050101.SUB.nc")
    

    creates a date variable with the value 20050101

    Currently trying to work this into the loop... perhaps:

    for (i in wi) {
      nc<-open.nc(i)
      dat<-read.nc(nc) 
    
      dat$date <- gsub("MERRA2_300.tavg1_2d_flx_Nx.|.SUB.nc", "", i)
    
      final_data <- rbind(final_data, dat)
    }