Search code examples
regexrstringrmetacharacters

Regex and file processing


This question relates to R but really isn't language specific per se. I have a bunch of csv files with this general format "sitename_03082015.csv". The files have 5 columns and various rows

Host    MaximumIn   MaximumOut  AverageIn   AverageOut
device1 30.63 Kbps  0 bps       24.60 Kbps  0 bps
device2 1.13 Mbps   24.89 Kbps  21.76 Kbps  461 bps
device5 698.44 Kbps 37.71 Kbps  17.49 Kbps  3.37 Kbps

I ultimately want to read in all the files and merge which I can do but during the merge I want to read the site name and date and add it to each related line so the output looks like this

Host      MaximumIn     MaximumOut  AverageIn   AverageOut  Site Name   Date
device1   30.63 Kbps    0 bps       24.60 Kbps  0 bps       SiteA       3/7/15
device12  1.13 Mbps     24.89 Kbps  21.76 Kbps  461 bps     SiteA       3/8/15
device1   698.44 Kbps   37.71 Kbps  17.49 Kbps  3.37 Kbps   SiteB       3/7/15
device2   39.08 Kbps    1.14 Mbps   10.88 Kbps  27.06 Kbps  SiteB       3/8/15
device3   123.43 Kbps   176.86 Kbps 8.62 Kbps   3.78 Kbps   SiteB       3/9/15

With my R code I can do the following:

#Get list of file names
filenames<- list.files(pattern = ".csv$")

#This extracts everything up to the underscore to get site name
siteName <- str_extract(string = filenames, "[^_]*")

# Extract date from file names use
date <- str_extract(string = filenames, "\\d{8}" )

With the below R code I can merge all the files but that will be without the added columns of site name and date that I want.

myDF<-do.call("rbind", lapply(filenames, read.table, header=TRUE, sep=","))

I just can't get my head around how to do the extracts for site and date adding and populating the columns to create my ideal dataframe which is the second table above.

The solution that best worked for me was posted below :)


Solution

  • The way that immediately comes to my mind is to do cbind while reading information with additional infor and do rbind afterwards. Something similar to this:

     myDF<-do.call("rbind", 
              lapply(filenames, 
                     function(x) cbind(read.table(x, header=TRUE, sep=","), 
                                                  "Site Name" = str_extract(string = x, "[^_]*"),
                                                  "Date" = as.Date(str_extract(string = x, "\\d{8}"), "%m%d%Y"))))