I have two character vectors, where one (bowlers in the below example) contains substrings of the other (full_names). I'd like to replace each element in bowlers with the matching full name. However, not every entry of full_names will appear in bowlers, and some of the entries in bowlers are short enough that they are full strings, rather than just substrings. There can also be multiple instances of the same element in bowlers.
An inefficient way to do this would be to create a vector of matches, but I want to be able to apply this to multiple data sets.
Example data:
bowlers <- c("Dilon Heylige", "Siddarth Mata", "Dilon Heylige", "Muhammad Sadi", "Adnesh Tondal", "Muhammad Sadi", "Timil Patel", "Siddarth Mata", "Timil Patel", "Marty Kain", "Muhammad Sadi", "Marty Kain", "Siddarth Mata", "Marty Kain", "Dilon Heylige", "Timil Patel", "Adnesh Tondal", "Muhammad Sadi", "Adnesh Tondal", "Dilon Heylige", "Neeraj Goel", "Sheryar Khan", "Neeraj Goel", "Sheryar Khan", "Hammad Azam", "Sheryar Khan", "Hammad Azam", "Vatsal Vaghel", "Hammad Azam", "Vatsal Vaghel", "Mohit Nataraj", "Zia Muhammad ", "Sheryar Khan", "Sami Aslam", "Neeraj Goel", "Zia Muhammad ", "Neeraj Goel", "Zia Muhammad ", "Vatsal Vaghel", "Zia Muhammad ")
full_names <- c("Karan Chandel", "Sami Aslam", "Neeraj Goel", "Zia Muhammad Shahzad", "Shivam Mishra", "Hammad Azam", "Mohit Nataraj", "Aditya Srinivas", "Sheryar Khan", "Vatsal Vaghela", "Saideep Ganesh", "Dilon Heyliger", "Siddarth Matani", "Muhammad Sadiq", "Adnesh Tondale", "Timil Patel", "Marty Kain", "Mrunal Patel", "Sri Krishna Anantha Raju", "Abhinay Reddy", "Ravi Timbawala", "Devam Shrivastava")
The closest thing I can get is using grepl(paste(full_names, collapse = "|"), bowlers)
, which provides a vector of TRUE and FALSE values.
Use grep()
, iterating over bowlers
with sapply()
:
sapply(bowlers, \(x) grep(x, full_names, value = TRUE))
Dilon Heylige Siddarth Mata Dilon Heylige
"Dilon Heyliger" "Siddarth Matani" "Dilon Heyliger"
Muhammad Sadi Adnesh Tondal Muhammad Sadi
"Muhammad Sadiq" "Adnesh Tondale" "Muhammad Sadiq"
Timil Patel Siddarth Mata Timil Patel
"Timil Patel" "Siddarth Matani" "Timil Patel"
Marty Kain Muhammad Sadi Marty Kain
"Marty Kain" "Muhammad Sadiq" "Marty Kain"
Siddarth Mata Marty Kain Dilon Heylige
"Siddarth Matani" "Marty Kain" "Dilon Heyliger"
Timil Patel Adnesh Tondal Muhammad Sadi
"Timil Patel" "Adnesh Tondale" "Muhammad Sadiq"
Adnesh Tondal Dilon Heylige Neeraj Goel
"Adnesh Tondale" "Dilon Heyliger" "Neeraj Goel"
Sheryar Khan Neeraj Goel Sheryar Khan
"Sheryar Khan" "Neeraj Goel" "Sheryar Khan"
Hammad Azam Sheryar Khan Hammad Azam
"Hammad Azam" "Sheryar Khan" "Hammad Azam"
Vatsal Vaghel Hammad Azam Vatsal Vaghel
"Vatsal Vaghela" "Hammad Azam" "Vatsal Vaghela"
Mohit Nataraj Zia Muhammad Sheryar Khan
"Mohit Nataraj" "Zia Muhammad Shahzad" "Sheryar Khan"
Sami Aslam Neeraj Goel Zia Muhammad
"Sami Aslam" "Neeraj Goel" "Zia Muhammad Shahzad"
Neeraj Goel Zia Muhammad Vatsal Vaghel
"Neeraj Goel" "Zia Muhammad Shahzad" "Vatsal Vaghela"
Zia Muhammad
"Zia Muhammad Shahzad"
(You can remove the names using unname()
; I left them to demonstrate the solution.)