Search code examples
rcsvenvi

read multiple ENVI files and combine them in one csv


I'm fairly new in working with R but trying to get this done. I have dozens of ENVI spectral datasets stored in a directory. Each dataset is seperated into two files. They all have the same name convention, i.e.:

  • ID_YYYYMMDD_350-200nm.asr
  • ID_YYYYMMDD_350-200nm.hdr

The task is to read the dataset, add two columns (ID and date from filename), and store the results in a *.csv-file. I got this to work for a single file (hardcoded).

library(caTools)

setwd("D:/some/path/software_scripts")

### filename without extension
name <- "011a_20100509_350-2500nm"

### split filename in area-id and date
flaeche<-substr(name, 0, 4)
date <- as.Date((substr(name,6,13)),"%Y%m%d")

### get values from ENVI-file in a matrix
spectrum <- read.ENVI(paste(name,".esl", sep = ""), headerfile=paste(name,".hdr", sep=""))

### add columns
spectrum <- cbind(Flaeche=flaeche,Datum=as.character(date),spectrum)


### CSV-Dataset with all values
write.csv(spectrum, file = name,".csv", sep=",")

I want to combine all available files into one *.csv file. I know that I've to use list.files but have no idea, how to implement the read.ENVI function and add the resulting matrices ongoing to CSV.


Update:

library(caTools)

setwd("D:/some/path/mean")

files <- list.files() # change or leave totally empty if setwd() put you in the right spot

all_names <- sub("^([^.]*).*", "\\1", files) # strip off extensions

name <- unique(all_names) # get rid of duplicates from .esl and .hdr

# wrap your existing code in a function
mungeENVI <- function(name) {

  # split filename in area-id and date
  flaeche<-substr(name, 0, 4)
  date <- as.Date((substr(name,6,13)),"%Y%m%d")

  # get values from ENVI-file in a matrix
  spectrum <- read.ENVI(paste(name,".esl", sep = ""), headerfile=paste(name,".hdr", sep=""))

  # add columns
  spectrum <- cbind(Flaeche=flaeche,Datum=as.character(date),spectrum)
  return(spectrum)
}

# use lapply to 'loop' over each name
list_of_ENVIs <- lapply(name, mungeENVI) # returns a list

# use do.call(rbind, x) to turn it into a big data.frame
final_df <- do.call(rbind, list_of_ENVIs)

# now write output
write.csv(final_df, "all_results.csv")

you can find a sample dataset here: Sample dataset


Solution

  • I work with a lot of lab data where I can rely on the output files being in a reliable format (same column order, column name, header format, etc). So this is assuming that the .ENVI files you have are similar to that. If your files are not like that, I'm happy to help with that too, I'd just need to see a dummy file or two.

    Anyways here's the idea:

    library(caTools)
    library(lubridate)
    library(magrittr)
    
    setwd("~/Binfo/TST/Stack/") # adjust as needed
    
    files <- list.files("data/", full.name = T) # adjust as needed
    all_names <- gsub("\\.\\D{3}", "", files) # strip off extensions
    names1 <- unique(all_names) # get rid of duplicates
    
    # wrap your existing code in a function
    mungeENVI <- function(name) {
        # split filename in area-id and date
        f <- gsub(".*\\/(\\d{3}\\D)_.*", "\\1", name)
        d <- gsub(".*_(\\d+)_.*", "\\1", name) %>% ymd()
        # get values from ENVI-file in a matrix
        spectrum <- read.ENVI(paste(name,".esl", sep = ""), headerfile=paste(name,".hdr", sep=""))
        # add columns
        spectrum <- cbind(Flaeche=f,Datum= as.character(d),spectrum)
        return(spectrum)
    }
    # use lapply to 'loop' over each name
    list_of_ENVIs <- lapply(names1, mungeENVI) # returns a list
    
    # use do.call(rbind, x) to turn it into a big data.frame
    final_df <- do.call(rbind, list_of_ENVIs)
    # now write output
    write.csv(final_df, "data/all_results.csv")
    

    Let me know if you have any problems and we an go from there. Cheers.

    I edited my answer a bit, I think the problem you were hitting is in list.files() it should have had the argument full.name = T. I also adjusted you parsing method to be a little more defensive and use grep capture expressions. I tested the code with your two example files (4 really) but I can build out a large matrix (66743 elements). Also I used lubridate, I think it's a better way to work with dates and times.