I have two vectors with given names as follows in R
:
A <- data.frame(c("Nick", "Maria", "Liam", "Oliver", "Sophia", "james", "Lucas; Luc"))
B <- data.frame(c("Liam", "Luc", "Evelyn; Eva", "James", "Harper", "Amelia"))
I want to compare the two vectors and create a vector C
with the names of vector B
that are not in the vector A
. I want the code to ignore the capital letters, i.e. to recognise that James
and james
is the same and if the name appear as two names (given name and preferred name), e.g., Lucas; Luc
, to recognise it as the same.
In the end, the result must be
C <- data.frame(c("Evelyn; Eva", "Harper","Amelia"))
Can someone help me?
Probably the ugliest code i did but it works.
A <- str_to_title(c("Nick", "Maria", "Liam", "Oliver", "Sophia", "james", "Lucas; Luc"))
B <- str_to_title(c("Liam", "Luc", "Evelyn; Eva", "James", "Harper", "Amelia"))
# Long version if you wish:
nested <- tibble(given=str_extract(c(A,B),"^[^;]+"),
preferred=str_extract(c(A,B),";\\s*([^;]+)") %>% str_extract("[a-zA-Z]+"),
list=c(rep("A",length(A)),rep("B",length(B)))) %>% nest_by(list)
A <- nested$data[[1]]
B <- nested$data[[2]]
unique_b <- B$given %in% A$given | B$given %in% A$preferred
B %>% filter(given %in% B$given[!unique_b]) %>%
mutate(c=ifelse(is.na(preferred),given,str_c(given,preferred,sep = "; "))
) %>% pull(c)