I am using R. I need to create a new column in a data frame that is the sum of the three variables. The sum should only take place if there are numeric values for each of the three variables. In other words, if there are any NAs or blanks the sum should not take place.
I have written the code below which works, but would like to simplify it. I am interested in using vectors to avoid repetition in my code.
data.x <- data.frame('time' = c(1:11),
'x' = c(5,3,"",'ND',2,'ND',7,8,'ND',1," "))
data.x[data.x == ''] <- 'NA'
data.x[data.x == ' '] <- 'NA'
data.x[data.x == 'ND'] <- 'NA'
data.x.na.omit <- na.omit(data.x)
data.y <- data.frame('time' = c(1:8),
'y' = c(5,2,3,1,2,NA,NA,8))
data.y[data.y == ''] <- 'NA'
data.y[data.y == ' '] <- 'NA'
data.y[data.y == 'ND'] <- 'NA'
data.y.na.omit <- na.omit(data.y)
data.z <- data.frame('time' = c(1:5),
'z' = c(1:5))
data.z[data.z == ''] <- 'NA'
data.z[data.z == ' '] <- 'NA'
data.z[data.z == 'ND'] <- 'NA'
data.z.na.omit <- na.omit(data.z)
data.x.y <- merge.data.frame(data.x.na.omit, data.y.na.omit, by.x = "time", by.y = "time")
data.x.y.z <- merge.data.frame(data.x.y, data.z.na.omit, by.x = "time", by.y = "time" )
data.x.y.z$x <- as.numeric(data.x.y.z$x)
data.x.y.z$y <- as.numeric(data.x.y.z$y)
data.x.y.z$z <- as.numeric(data.x.y.z$z)
data.x.y.z$result <- data.x.y.z$x + data.x.y.z$y + data.x.y.z$z
I don't see particularly good ways to use vectors to avoid repetition. I would suggest the following, though:
NA
rows by evaluating the result
column once, so you don't have to do this for each of x
, y
and z
. stringsAsFactors
to FALSE
so using a single line like data.x$x <- as.numeric(data.x$x)
will automatically coerce strings to NA
, and you don't have to do it separately.NA
to the bottom of columns y
and z
), rather than creating data.x, data.y and data.z then merging.For example, code with these suggestions might look like this:
# Create merged data
data <- data.frame('time' = c(1:11),
'x' = c(5,3,"",'ND',2,'ND',7,8,'ND',1," "),
'y' = c(5,2,3,1,2,NA,NA,8, rep(NA, 3)),
'z' = c(1:5, rep(NA, 6)),
stringsAsFactors=F)
# Convert x, y and z to numeric
for(col in c("x", "y", "z"))
class(data[,col]) <- "numeric"
# Add x, y and z together
data$result <- data$x + data$y + data$z
# Remove NAs at the end
data <- na.omit(data)
If your data sources are such that you can't bring them in as a single dataframe, but you have to merge them, then you could replace the "Create merged data" section with something like this:
# Create separate data
data.x <- data.frame('time' = c(1:11),
'x' = c(5,3,"",'ND',2,'ND',7,8,'ND',1," "),
stringsAsFactors=F)
data.y <- data.frame('time' = c(1:8),
'y' = c(5,2,3,1,2,NA,NA,8),
stringsAsFactors=F)
data.z <- data.frame('time' = c(1:5),
'z' = c(1:5),
stringsAsFactors=F)
# Merge data
data.xy <- merge(data.x, data.y)
data <- merge(data.xy, data.z)
# Now continue main code suggestion from the 'Convert x, y and z to numeric' section