Search code examples
rproportions

grab some columns and calculate the proportion inr


id=1:5
age1=c(67,39,97,55,37)
age2=c(300,122,333,70,333)
age3=c(1,3,6,1,3)
age4=c(56,33,34,77,99)
gender=c("f","m","f","f","m")
data=data.frame(id, age1, age2, age3, age4, gender)

length(data$age1[data$age1 > 50])/length(data$age1)
length(data$age2[data$age2 > 50])/length(data$age2)
length(data$age3[data$age3 > 50])/length(data$age3)
length(data$age4[data$age4 > 50])/length(data$age4)

First, I want to grab the age columns (age1, age2, age3, age4) using %in% operator (grab the columns whose name has age in it)

and then, I want to calculate the proportion- but my code seems to be inefficient. This is a reproducible example, and in my data, I have different 30 ages...


Solution

  • A base solution with grep() to extract column names containing "age":

    colMeans(data[grep("age", names(data))] > 50)
    
    # age1 age2 age3 age4
    #  0.6  1.0  0.0  0.6
    

    You can also use summarise() with across() from dplyr.

    library(dplyr)
    
    data %>%
      summarise(across(contains("age"), ~ mean(.x > 50)))
    
    #   age1 age2 age3 age4
    # 1  0.6    1    0  0.6
    

    Hint: You can use mean() to get the proportion of TRUE of a logical vector.