Search code examples
rregexstringvectorfunctional-programming

Matching two string vectors disregarding the order of the element names


Below, I want to match() two string vectors: my first vector and my second1 vector as in:

( desired_output = match(first, second1) ) #> [1] 4 5 1 6 2 3

But my second1 vector could have its two-part string elements in reverse such as instead of "perf.acog" being "acog.perf" as shown in my second2 vector.

Question: Can we have match(first, second2) produce the same output as match(first, second1) does?

NOTE: The two-part string elements may be connected by any separator defined within "[^[:alnum:]]+" such as "." (ex. "perf.acog"), "_" (ex. "perf_acog") etc. So, a general/functional answer is appreciated.

first = c("asom.acog", "conf.acog", "perf.acog", "conf.asom", "perf.asom", 
          "perf.conf")

second1 = c("perf.acog", "perf.asom", "perf.conf", "asom.acog", "conf.acog", 
           "conf.asom")

second2 = c("acog.perf", "asom.perf", "conf.perf", "acog.asom", "acog.conf", 
            "asom.conf")

Solution

  • Using rev on a string strsplit by non-word characters \\W.

    EDIT: Also included "first" in separator generalization to enable matching if both strings can have any non-character separator.

    match(sub("\\W", "", first), lapply(strsplit(second2, "\\W"), 
        \(x) paste(rev(x), collapse="")))
    [1] 4 5 1 6 2 3
    

    Test:

    identical(
      match(first, second1), 
      match(sub("\\W", "", first), lapply(strsplit(second2, "\\W"), 
        \(x) paste(rev(x), collapse=""))))
    [1] TRUE