The data is as follows:
library(fuzzyjoin)
nr <- c(1,2)
col2 <- c("b","a")
dat <- cbind.data.frame(
nr, col2
)
thelist <- list(
aa=c(1,2,3),
bb=c(1,2,3)
)
I would like to the following:
stringdist_left_join(dat, thelist, by = "col2", method = "lcs", max_dist = 1)
But this (unsurprisingly) gives an error:
Error in `group_by_prepare()`:
! Must group by variables found in `.data`.
* Column `col` is not found.
Run `rlang::last_error()` to see where the error occurred.
What would be the best way to do this?
Desired output:
nr col2 thelist list_col
1 b bb c(1,2,3)
2 a aa c(1,2,3)
This is a bit of a hack. Not sure if there is a more elegant solution.
Create a data.frame of the transposed list and pivot this into a data.frame with all the names of the list in a column named "col2". Then use fuzzy join to merge the data. With the resulting out
data.frame, you can drop the columns you don't need.
library(fuzzyjoin)
library(tidyr)
dat <- data.frame(
nr = c(1,2), col2 = c("b","a")
)
thelist <- list(
aa=c(1,2,3),
bb=c(1,2,3,4)
)
# create data.frame with list info
a <- tibble(col2 = names(thelist), value = thelist)
a
# A tibble: 2 x 2
col2 value
<chr> <named list>
1 aa <dbl [3]>
2 bb <dbl [4]>
# merge data
out <- stringdist_left_join(dat, a, by = "col2", method = "lcs", max_dist = 1)
out
nr col2.x col2.y value
1 1 b bb 1, 2, 3, 4
2 2 a aa 1, 2, 3