Search code examples
rcountsum

Count character rows meeting multiple criteria in R


I am trying to count the number of rows which meet multiple criteria (criteria in different columns) in R on a large dataset and am having some trouble getting the results I want.

Example dataframe:

LU <- c("A", "A", "P", "P", "A", "G", "A", "G") 
Type <- c("one", "one", "one", "three", "two", "three", "one", "two") 
Style <- c("C-acq", "C-acq", "C-acq", "N-acq", "C-acq", "Ox", "C-acq", "N-acq")
df <- data.frame(LU, Type, Style)

To count the number of apples, that are num 1, and location A-01, I've tried this:

length(intersect(which(df1['LU']=="A", which(df['Type']=="one"))))

but can't get it to work/get an error saying the "selecting method for function 'which': undefined columns selected". The sum function works on this mock data but when I try to apply it to my own data it seems to just be adding up the number of rows which fall into any of the criteria I listed (instead of adding up the number of rows based on the criteria I applied) or says NA (which I know isn't correct).

sum function I've used (works on this mini data but not on my data):

sum(df1$LU == "A" & df1$Type == "one" & df1$Style == "C-acq")

Is there another way to count the number of rows meeting multiple criteria? (not sure how to get the count function to do this). I think a problem with my data is that rows may have NA values which may be interfering with the sum function.

I know there are a lot of questions about stuff like this (other threads listed below) so am sorry if this a redundant question but cannot find an answer that works for my data. Thanks for the help!

Count with table output

Rows meeting multiple conditions

Countif in R


Solution

  • Here we need to extract as a vector with [[ or add a , (with tibble/data.table it may not work though) or use $. Also, the first which is not closed

    length(intersect(which(df[['LU']]=="A"), which(df[['Type']]=="one")))
    #[1] 3
    

    In addition, the syntax shows df and df1 as object names whereas the example object is 'df'


    Or we can also use subset with nrow

    nrow(subset(df, LU == "A" & Type == "one"))
    #[1] 3
    

    If there are NA elements, either add & !is.na(LU) & !is.na(Type) or use %in% as %in% returns FALSE for NA elements whereas == returns NA

    nrow(subset(df, LU %in% "A" & Type %in% "one"))