Summary: I am trying to count the number of times a value in 1 column appears across multiple others. I am able to do this if I specify the value manually but want to be able to do this using values from another column instead. However, I have not been able to figure that part out yet.
Detailed: I am able to count the number of times a certain value appears across multiple columns. However, to do this, I have to specify the value manually. Instead I would like to count the number of times a certain value in one column appears across other columns.
I have gone through several threads (link1, link2) but they are not exactly what I am looking for as I would like to do this in base R.
Here is a dummy script where the goal is to add a new column called 'n_v1' which contains the number of times the value in 'v1' appears across columns 'c1' to 'c7'.
# Create dataframe
c1 = c('A','B','C')
c2 = c('B','B','A')
c3 = c('C','C','A')
c4 = c('B','C','C')
c5 = c('B','A','B')
c6 = c('C','B','A')
c7 = c('C','B','B')
v1 = c('A','C','B')
df = data.frame(c1,c2,c3,c4,c5,c6,c7,v1)
# Count number of times A, B, and C appear across columns c1 to c7
df$n_A = apply(df[,1:7], 1, function(x) length(which(x=='A')))
df$n_B = apply(df[,1:7], 1, function(x) length(which(x=='B')))
df$n_C = apply(df[,1:7], 1, function(x) length(which(x=='C')))
# Attempt to count number of times the value in v1 appears across columns c1 to c7
df$n_v1 = apply(df[,1:7], 1, function(x) length(which(x==v1)))
# I received the following warning messages and was unable to get the desired output.
# Warning messages:
# 1: In x == v1 :
# longer object length is not a multiple of shorter object length
# 2: In x == v1 :
# longer object length is not a multiple of shorter object length
# 3: In x == v1 :
# longer object length is not a multiple of shorter object length
You can use rowSums
with ==
.
rowSums(df[, 1:7] == df$v1)
#[1] 1 2 2
Your error comes from the fact that v1
is being recycled in ==
in apply
(because its length (3) differs from that of x
, the rows (7)). For example, in the first row, it compares:
> df[1, 1:7]
c1 c2 c3 c4 c5 c6 c7
1 A B C B B C C
with
> rep(v1, length.out = 7)
[1] "A" "C" "B" "A" "C" "B" "A"
To use an apply*
function, you need one that iterates over two vectors, not one. You can use mapply
, but it iterates over columns, so you can t
ranspose it first:
mapply(\(x, y) length(which(x == y)), as.data.frame(t(df[, 1:7])), v1)
But I highly recommend the first answer!