I frequently find myself using a for
loop to perform row-wise operations involving multiple dataframes, along the lines of the following example:
# Sample data
set.seed(123)
df1 <- data.frame(a = sample(1:100, size = 20), b = sample(letters, size = 20))
df2 <- data.frame(c = sample(1:100, size = 20), d = sample(letters, size = 20))
# Sample loop operation
for (i in 1:nrow(df2)){
number.2 <- df2$c[i]
letter.1 <- df1$b[df1$a == number.2]
df2$x[i] <- ifelse(!is_empty(letter.1), paste0(letter.1), paste0("No match"))
}
The code does what I want, I just suspect that there's a more elegant way to go about it using apply
.
Is there a way to do this using the apply
family of functions?
What you've got here is a long way to code a common operation. This is a one-liner using match()
, or you can make it more general to a greater variety of cases and columns using merge
or one of dplyr
's join
functions.
match
works well when you are matching one value. merge
or join
will work well if you are matching multiple columns. If you have a more complicated condition, like >=
instead of ==
, then you need a "non-equi join", which is supported in dplyr
or data.table
.
df2$result = df1[match(df2$c, df1$a), "b"]
df2
# c d x result
# 1 89 v No match <NA>
# 2 34 z No match <NA>
# 3 93 g No match <NA>
# 4 69 p p p
# 5 72 q x x
# 6 76 r No match <NA>
# 7 63 y No match <NA>
# 8 13 b No match <NA>
# 9 82 d No match <NA>
# 10 91 m No match <NA>
# 11 25 e w w
# 12 38 f No match <NA>
# 13 21 c No match <NA>
# 14 79 i q q
# 15 41 u No match <NA>
# 16 47 o No match <NA>
# 17 60 t No match <NA>
# 18 16 j No match <NA>
# 19 6 x No match <NA>
# 20 96 n No match <NA>