Search code examples
rcountsummarizecbindrowsum

Counting the occurences of a string in dataframe row


I have a data frame (named as df) of 144 columns (trial numbers) containing the information about the trial success (Yes/No) per participant (the rows). A subset would look like this:

V1      V2      V3      V4      V5  
Yes     No      Yes     Yes     No
Yes     No      No      No      No
Yes     Yes     Yes     Yes     No

I want to count the occurrences of Yes and No outcomes per participant across 144 trials. However, I also want to subset specific trial numbers (take V1, V4, V5, V110, V112, etc.) and count the outcomes accordingly. If I write a code as:

Yes <- rowSums(df == "Yes") # Count the "No" per row
cbind(Yes, No = ncol(df) - Yes) # Subscribe these from the columns numbers and combine
#       Yes   No
# [1,]    3    2
# [2,]    1    4
# [3,]    4    1

This gives me the counts of Yes and No's per participant, but across all trials. How can I specify certain columns (trials) and count per participant?


Solution

  • You can subset df using [ while comparing. Here columns 1, 4, and 5 are selected.

    rowSums(df[,c(1,4,5)] == "Yes") #For column 1, 4 and 5
    #[1] 2 1 2
    

    To calculate the percentage of Yes (asked in the comments), rowMeans could be used:

    100 * rowMeans(df == "Yes")
    #[1] 60 20 80