Search code examples
rtidyrreadr

R read multiple csv's and add ID key with readr/tidy


We get groundwater depth data from data loggers as a .txt file. The format of this data not changeable. It has consistent column names except that the first column name contains the well id. For each well, the data is structured:

"well 02,Time,Current(feet),Serial Number  
1,2018-02-11 11:18:44,-5.00,020013603  
2,2018-02-11 17:18:44,7.30  
3,2018-02-11 23:18:44,5.40  
4,2018-02-12 05:18:44,0.80  
5,2018-02-12 11:18:44,12.60... "

"Well 17,Time,Current(feet),Serial Number  
1,2018-02-11 00:32:01,-5.00,000025390  
2,2018-02-11 06:32:01,5.45  
3,2018-02-11 12:32:01,5.40  
4,2018-02-11 18:32:01,5.40..."

I am trying to create a script that merges csv .txt files from multiple into one long data frame. I only need column 2 and 3 from these files. I also need to add an id column noting which well the observation came from. In the end, I would like the to have a 3xn df with colnames = well_id, obs_date, observation

Note that the file name also contains the well id.

So far...

txt_list <- list.files(pattern = ".txt", full.names = TRUE)  
txt_read_csv <- map(txt_list,read_csv, col_names = TRUE, col_types = "iTdc")

...gets me a list of tibbles. But I cannot figure out how to select and mutate tibbles within a list to reduce the columns and assign an ID. I tried map_df, but since the first column name is different in each .txt the df explodes into a wide table with multiple columns for each different colname.

I can do this with a loop, but I'd really like to keep it tidy.


Solution

  • Try creating a custom function which wraps readr::read_csv, which can read the data, add the well id as a new column, and select and rename the columns you need. Then you can use map_dfr to apply this function to each of your files, and finally join them (by row).

    library(readr)
    library(dplyr)
    
    read_wells_csv <- function(filepath){
    
      df <- read_csv(filepath, col_names = TRUE, col_types = "iTdc")
    
      well_id <- colnames(df)[1]
    
      df %>% 
        mutate(well_id = well_id) %>%
        rename(obs_date = Time, observation = `Current(feet)`) %>% 
        select(well_id, obs_date, observation) 
    }
    
    txt_list <- list.files(pattern = ".txt", full.names = TRUE)  
    txt_read_csv <- map_dfr(txt_list, filepath)