Search code examples
rreplacestringi

How to perform multiple string pattern replacement without overwriting previous replacements?


I'd like to take algebraic chess notation and convert the file letters (a, b, c, d, e, f, g, h) to the NATO phonetic alphabet (alpha, bravo, charlie, echo, foxtrot, golf, hotel), without overwriting previous replacements. I'm working in R.

notation <- "1.d4 Nf6 2.c4 e6 3.g3 d5 4.Bg2 Be7 5.Nf3 0-0 6.0-0 dxc4 7.Qc2 a6 8.Qxc4 b5 9.Qc2 Bb7 10.Bd2 Ra7 "

Desired outcome: "1.delta 4 Nfoxtrot 6 2.charlie 4 echo 6 3.golf 3 delta 5" and so on. I do not care about spacing right now.

If I use a naive string replacement method, the replacements will conflict with each other.

Using gsub:

notation <- gsub("a", "alpha", notation)
notation <- gsub("b", "bravo", notation)
notation <- gsub("c", "charlie", notation)
notation <- gsub("d", "delta", notation)
notation <- gsub("e", "echo", notation)
notation <- gsub("f", "foxtrot", notation)
notation <- gsub("g", "golf", notation)
notation <- gsub("h", "hotel", notation)

Yields "1.dechotelolta4 Nfoxtrot6 2.chotelarliechotelo4 echotelo6 3.golf3 dechotelolta5 4.Bgolf2 Bechotelo7 5.Nfoxtrot3 0-0 6.0-0 dechoteloltaxchotelarliechotelo4 7.Qchotelarliechotelo2 alphotela6 8.Qxchotelarliechotelo4 bravo5 9.Qchotelarliechotelo2 Bbravo7 10.Bdechotelolta2 Ralphotela7 "

'd' converts to 'delta', which is good. However, 'delta' contains the letter 'e', and so becomes 'decholta'. There's an 'h' in there, so the result becomes 'dechotelolta'.

I also tried a function from the stringi library, but it also returns something similarly undesirable.

stri_replace_all_fixed(notation, 
                         c("a", "b", "c", "d", "e", "f", "g", "h"), 
                         c("alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"), 
                         vectorise_all = FALSE)

I looked around their documentation and several SO questions, but wasn't able to find what I need.

This python question is close, but limited to single character replacement.

So I am looking for a function/method that will replace multiple patterns, but I do not want the replacement texts to overwrite/alter each other.

My best guess right now is to build a new string by reading notation one character at a time, and appending copies of a single character or substitutions of a-h letters to the new string. But that feels very un-R-like. Does anyone have any suggestions or know of a library function with the desired outcome?


Solution

  • nato <- c("alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel", "india", "juliett", "kilo", "lima", "mike", "november", "oscar", "papa", "quebec", "romeo", "sierra", "tango", "uniform", "victor", "whiskey", "x-ray", "yankee", "zulu")
    tr <- setNames(nato, letters)
    
    stringr::str_replace_all(notation, "[a-z]", ~ tr[.x])
    # [1] "1.delta4 Nfoxtrot6 2.charlie4 echo6 3.golf3 delta5 4.Bgolf2 Becho7 5.Nfoxtrot3 0-0 6.0-0 deltax-raycharlie4 7.Qcharlie2 alpha6 8.Qx-raycharlie4 bravo5 9.Qcharlie2 Bbravo7 10.Bdelta2 Ralpha7"
    

    [a-z] will only match lower case letters. The third argument of str_replace_all is the replacement value of the pattern match. Not often used is the fact that you can provide a function (from ?str_replace_all):

    Alternatively, supply a function, which will be called once for each match (from right to left) and its return value will be used to replace the match.


    Alternatively, mgsub package allows for simultaneous substitution and is very succinct:

    library(mgsub)
    
    mgsub(notation, letters, nato)