I want to know if is there a way to handle NA values when I try to sum some columns from a data frame?
This is a simulated example of the data I am working with:
id<-rep(1:4,each=8)
v1<-c(1,2,5,4,58,6,4,9)
v2<-c(78,85,56,47,12,3,65,98)
v3<-c(101,NA,452,NA,NA,45,7,56)
data<-data.frame(id,v1,v2,v3)
data
id v1 v2 v3
1 1 1 78 101
2 1 2 85 NA
3 2 5 56 452
4 2 4 47 NA
5 3 58 12 NA
6 3 6 3 45
7 4 4 65 7
8 4 9 98 56
I wanto apply this formula using v1,v2,v3:
data$cat<-v1*0.05+v2*0.05+v3*0.05
This is the result I get when I use the sum:
data
id v1 v2 v3 cat
1 1 1 78 101 9.00
2 1 2 85 NA NA
3 2 5 56 452 25.65
4 2 4 47 NA NA
5 3 58 12 NA NA
6 3 6 3 45 2.70
7 4 4 65 7 3.80
8 4 9 98 56 8.15
v1,v2 and v3 are numeric vectors
You can try rowSums
with na.rm = TRUE
(as @akrun said in the comment) like below
data$cat <- rowSums(data[-1] * c(0.05, 0.05, 0.05)[col(data[-1])], na.rm = TRUE)
which gives
> data
id v1 v2 v3 cat
1 1 1 78 101 9.00
2 1 2 85 NA 4.35
3 1 5 56 452 25.65
4 1 4 47 NA 2.55
5 1 58 12 NA 3.50
6 1 6 3 45 2.70
7 1 4 65 7 3.80
8 1 9 98 56 8.15
9 2 1 78 101 9.00
10 2 2 85 NA 4.35
11 2 5 56 452 25.65
12 2 4 47 NA 2.55
13 2 58 12 NA 3.50
14 2 6 3 45 2.70
15 2 4 65 7 3.80
16 2 9 98 56 8.15
17 3 1 78 101 9.00
18 3 2 85 NA 4.35
19 3 5 56 452 25.65
20 3 4 47 NA 2.55
21 3 58 12 NA 3.50
22 3 6 3 45 2.70
23 3 4 65 7 3.80
24 3 9 98 56 8.15
25 4 1 78 101 9.00
26 4 2 85 NA 4.35
27 4 5 56 452 25.65
28 4 4 47 NA 2.55
29 4 58 12 NA 3.50
30 4 6 3 45 2.70
31 4 4 65 7 3.80
32 4 9 98 56 8.15