Search code examples
rstringfiledata-wranglingstringi

identifying strings in folder names to create variables (stringi r)


I hope this finds you well.

I have a list of csv files that use a convention similar to this one, "SubB1V2timecourses_chanHbO_Cond2_202010281527"

I want to merge all of the files in the dataset and add in variables such as ID (B1V2), chromophore (HbO in this case; but other files are labeled Hbb); condition (Cond2 in this case, but could be Cond1-Cond9).

Below I have my current function. So far I can read in the ID, time (which is a separate excel document), and the data. However, I am getting NAs for Condition and Chromophore. Is there something I am missing in the string specification?

Any help is truly appreciated.

Take care and stay well, Caroline

multmerge <- function(mypath){
  require(stringi)
  require(readxl)
  filenames <- list.files(path=mypath, full.names=TRUE) #path=mypath
  datalist <- lapply(filenames, function(x){
    df <- read.csv(file=x,header= TRUE)
    ID <- unlist(stri_extract_all_regex(toupper(x), "B\\d+"))
   Condition <- unlist(stri_extract_all_regex(tolower(x), "Cond\\d+"))
   Chromophore <- ifelse(stri_detect_regex(toupper(x), "HbO"), "HbO",
                      ifelse(stri_detect_regex(toupper(x), "Hbb"), "Hbb", "NA"))
     #ifelse(stri_detect_regex(tolower(x), "nonsocial"),"NonSocial",
                      #  ifelse(stri_detect_regex(tolower(x),"social-inverted"), "social_inverted",
                              # ifelse(stri_detect_regex(tolower(x),"social"), "social", "NA")))
   # time <- read_excel("time4hz.xlsx")
    df <- data.frame(ID, time, Condition, Chromophore, df)
    return(df)
  }) # end read-in function
  
  Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}


Solution

  • Maybe you want something like strcapture? For example, if you have a list of file names like this

    filenames <- c(
      "/path/to/SubB1V2timecourses_chanHbO_Cond2_202010281527", 
      "/path/to/SubB4V9timecourses_chanHbb_Cond7_202010011527"
    )
    

    Then

    strcapture(
      "Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\\d+", 
      basename(filenames), 
      data.frame(ID = character(), chromophore = character(), condition = character())
    )
    

    returns

        ID chromophore condition
    1 B1V2         HbO     Cond2
    2 B4V9         Hbb     Cond7
    

    Combine this with your multmerge:

    multmerge <- function(mypath){
      filenames <- list.files(path = mypath, full.names = TRUE) #path=mypath
      metadata <- strcapture(
        "Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\\d+", 
        basename(filenames), 
        data.frame(ID = character(), chromophore = character(), condition = character())
      )
      datalist <- lapply(seq_along(filenames), function(i, nms, info) {
        df <- read.csv(file = nms[[i]], header = TRUE)
        data.frame(info[i, ], df)
      }, filenames, metadata)
      Reduce(function(x,y) {merge(x, y, all = TRUE)}, datalist)
    }