I am trying to count the number of rows which meet multiple criteria (criteria in different columns) in R on a large dataset and am having some trouble getting the results I want.
Example dataframe:
LU <- c("A", "A", "P", "P", "A", "G", "A", "G")
Type <- c("one", "one", "one", "three", "two", "three", "one", "two")
Style <- c("C-acq", "C-acq", "C-acq", "N-acq", "C-acq", "Ox", "C-acq", "N-acq")
df <- data.frame(LU, Type, Style)
To count the number of apples, that are num 1, and location A-01, I've tried this:
length(intersect(which(df1['LU']=="A", which(df['Type']=="one"))))
but can't get it to work/get an error saying the "selecting method for function 'which': undefined columns selected". The sum function works on this mock data but when I try to apply it to my own data it seems to just be adding up the number of rows which fall into any of the criteria I listed (instead of adding up the number of rows based on the criteria I applied) or says NA (which I know isn't correct).
sum function I've used (works on this mini data but not on my data):
sum(df1$LU == "A" & df1$Type == "one" & df1$Style == "C-acq")
Is there another way to count the number of rows meeting multiple criteria? (not sure how to get the count function to do this). I think a problem with my data is that rows may have NA values which may be interfering with the sum function.
I know there are a lot of questions about stuff like this (other threads listed below) so am sorry if this a redundant question but cannot find an answer that works for my data. Thanks for the help!
Count with table output
Here we need to extract as a vector
with [[
or add a ,
(with tibble/data.table
it may not work though) or use $
. Also, the first which
is not closed
length(intersect(which(df[['LU']]=="A"), which(df[['Type']]=="one")))
#[1] 3
In addition, the syntax shows df
and df1
as object names whereas the example object is 'df'
Or we can also use subset
with nrow
nrow(subset(df, LU == "A" & Type == "one"))
#[1] 3
If there are NA
elements, either add & !is.na(LU) & !is.na(Type)
or use %in%
as %in%
returns FALSE for NA elements whereas ==
returns NA
nrow(subset(df, LU %in% "A" & Type %in% "one"))