Search code examples
rsimplification

Is there a way I can simplify the code below using vectors?


I am using R. I need to create a new column in a data frame that is the sum of the three variables. The sum should only take place if there are numeric values for each of the three variables. In other words, if there are any NAs or blanks the sum should not take place.

I have written the code below which works, but would like to simplify it. I am interested in using vectors to avoid repetition in my code.


data.x <- data.frame('time' = c(1:11),
                   'x' = c(5,3,"",'ND',2,'ND',7,8,'ND',1," "))
data.x[data.x == ''] <- 'NA'
data.x[data.x == ' '] <- 'NA'
data.x[data.x == 'ND'] <- 'NA'
data.x.na.omit <- na.omit(data.x)             

data.y <- data.frame('time' = c(1:8),
                     'y' = c(5,2,3,1,2,NA,NA,8))
data.y[data.y == ''] <- 'NA'
data.y[data.y == ' '] <- 'NA'
data.y[data.y == 'ND'] <- 'NA'
data.y.na.omit <- na.omit(data.y)  


data.z <- data.frame('time' = c(1:5),
                     'z' = c(1:5))
data.z[data.z == ''] <- 'NA'
data.z[data.z == ' '] <- 'NA'
data.z[data.z == 'ND'] <- 'NA'
data.z.na.omit <- na.omit(data.z)   

data.x.y <- merge.data.frame(data.x.na.omit, data.y.na.omit, by.x = "time", by.y = "time")
data.x.y.z <- merge.data.frame(data.x.y, data.z.na.omit, by.x = "time", by.y = "time" )

data.x.y.z$x <- as.numeric(data.x.y.z$x)
data.x.y.z$y <- as.numeric(data.x.y.z$y)
data.x.y.z$z <- as.numeric(data.x.y.z$z)

data.x.y.z$result <- data.x.y.z$x + data.x.y.z$y + data.x.y.z$z


Solution

  • I don't see particularly good ways to use vectors to avoid repetition. I would suggest the following, though:

    1. Removing NA rows by evaluating the result column once, so you don't have to do this for each of x, y and z.
    2. Setting stringsAsFactors to FALSE so using a single line like data.x$x <- as.numeric(data.x$x) will automatically coerce strings to NA, and you don't have to do it separately.
    3. Bringing in the data as a single dataframe (by adding NA to the bottom of columns y and z), rather than creating data.x, data.y and data.z then merging.

    For example, code with these suggestions might look like this:

    # Create merged data
    data <- data.frame('time' = c(1:11),
                       'x' = c(5,3,"",'ND',2,'ND',7,8,'ND',1," "),
                       'y' = c(5,2,3,1,2,NA,NA,8, rep(NA, 3)),
                       'z' = c(1:5, rep(NA, 6)),
                       stringsAsFactors=F)
    
    # Convert x, y and z to numeric
    for(col in c("x", "y", "z"))
      class(data[,col]) <- "numeric"
    
    # Add x, y and z together
    data$result <- data$x + data$y + data$z
    
    # Remove NAs at the end
    data <- na.omit(data)
    

    If your data sources are such that you can't bring them in as a single dataframe, but you have to merge them, then you could replace the "Create merged data" section with something like this:

    # Create separate data
    data.x <- data.frame('time' = c(1:11),
                         'x' = c(5,3,"",'ND',2,'ND',7,8,'ND',1," "),
                         stringsAsFactors=F)
    data.y <- data.frame('time' = c(1:8),
                         'y' = c(5,2,3,1,2,NA,NA,8),
                         stringsAsFactors=F)
    data.z <- data.frame('time' = c(1:5),
                         'z' = c(1:5),
                         stringsAsFactors=F)
    
    # Merge data
    data.xy <- merge(data.x, data.y)
    data <- merge(data.xy, data.z)
    
    # Now continue main code suggestion from the 'Convert x, y and z to numeric' section