I want to get sums of a variable depending on the other variable's na
or non-na
values in R
. A working example code is below:
library(dplyr)
df <- data.frame(A = c(1,2,3,NA,4),
B = c(NA,2,3,NA,5),
C = c(3,4,NA,NA,NA),
REF = c(10,20,30,40,50))
df.na <- df %>% mutate_at(vars(-REF),is.na)
sums <- matrix(0,2,3)
row.names(sums) <- c("NON-NA","NA")
colnames(sums) <- c("A","B","C")
for(i in 1:3){
sums[,i] <- df.na %>% group_by_at(i) %>% summarise(sum=sum(REF)) %>% select(sum) %>% unlist()
}
> sums
A B C
NON-NA 110 100 30
NA 40 50 120
For example, since 4th term in the A
column is NA
, the corresponding column values are 30
and 10+20+3+50 = 150-30 = 120
in sums
object.
My question is how do I get this output without a for loop?
Here is a solution using the pivot_
functions from tidyr
. The approach pivots to a longer form so that you can group by column name and whether the column value is NA.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = c("A", "B", "C")) %>%
mutate(isna = is.na(value)) %>%
group_by(name, isna) %>%
summarize(value = sum(REF)) %>%
pivot_wider()
isna A B C
<lgl> <dbl> <dbl> <dbl>
1 FALSE 110 100 30
2 TRUE 40 50 120