Search code examples
excelexcel-formulareplacediacriticsexcel-365

Using office 365, how can I replace diacritics with 'normal' letters with an efficient formula?


I want to turn letters with diacritics into plain letters, so -

áàȧäǎāãåąⱥấầắằǡǻǟẫẵảȁȃẩẳạḁậặăâ =a

ḃƀɓḅḇ =b

ćċĉčçȼḉƈɔ =c

ḋďḑđɗḍḓḏðɖ =d

I'm using office 365, but need to achieve this without addins or VBA, using a formula. A lambda function is fine.

There's a solution here but

  1. It's slow
  2. It's hard to read
  3. It needs a single string as the input - it won't work with (and produce) an array of words without the diacritics in
  4. If a diacritic happens not to be listed, it won't be 'caught'

Re 2. & 4. the formula just lists some diacritics to be replaced - looking at Excel's character codes, there's no obvious pattern that can be used. So that might be the only way to do it.

I need a solution which doesn't use addins or VBA. Designed for Excel 2019, the solution below doesn't take advantage of LAMDA functions, which I can use.

=CONCAT(IFERROR(IF(EXACT(UPPER(MID(B1,ROW(A$1:INDEX(A:A,LEN(B1))),1)),MID(B1,ROW(A$1:INDEX(A:A,LEN(B1))),1)),UPPER(MID(CONCATENATE("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbcccccccccddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeffggggggggghhhhhhhhhiiiiiiiiiiiiiiiiiiiijjjkkkkkkkklllllllllllll","mmmnnnnnnnnnnnnooooooooooooooooooooooooooooooooooooooooopppprrrrrrrrrrrrsssssssssttttttttttuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuvvwwwwwwxxyyyyyyyyyyyyzzzzzzz"),SEARCH(MID(B1,ROW(A$1:INDEX(A:A,LEN(B1))),1),CONCATENATE("áàȧäǎāãåąⱥấầắằǡǻǟẫẵảȁȃẩẳạḁậặăâḃƀɓḅḇćċĉčçȼḉƈɔḋďḑđɗḍḓḏðɖéèėêëěĕēẽęȩɇếềḗḕễḝẻȅȇểẹḙḛệéèêëḟƒǵġĝǧğḡģǥɠḣĥḧȟḩħḥḫⱨıíìïǐĭīĩįɨḯỉȉȋịḭíìïîȷĵǰḱǩķƙᶄḳḵⱪĺŀľɫⱡļƚłḷḽḻḹꝈ","ḿṁṃńǹṅňñņɲṇṋṉñŋóòȯôöǒŏōõǫőốồɵøṓṑṍȫỗṏǿȭǭỏȍȏơổọớờỡộởợóòôöõṕṗᵽƥʀŕṙřŗɍɽȑȓṛṟṝśṡŝšṥṧṣṩşṫťƭṭʈṱṯⱦţŧúùûüǔŭūũůųűʉǘǜṹṻủȕȗưụṳứừṷṵữửựúùûüṽṿẃẁẇŵẅẉẋẍýỳẏŷÿȳỹɏỷƴỵýźżẑžƶẓẕ")),1)),MID(CONCATENATE("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbcccccccccddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeffggggggggghhhhhhhhhiiiiiiiiiiiiiiiiiiiijjjkkkkkkkklllllllllllll","mmmnnnnnnnnnnnnooooooooooooooooooooooooooooooooooooooooopppprrrrrrrrrrrrsssssssssttttttttttuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuvvwwwwwwxxyyyyyyyyyyyyzzzzzzz"),FIND(MID(B1,ROW(A$1:INDEX(A:A,LEN(B1))),1),CONCATENATE("áàȧäǎāãåąⱥấầắằǡǻǟẫẵảȁȃẩẳạḁậặăâḃƀɓḅḇćċĉčçȼḉƈɔḋďḑđɗḍḓḏðɖéèėêëěĕēẽęȩɇếềḗḕễḝẻȅȇểẹḙḛệéèêëḟƒǵġĝǧğḡģǥɠḣĥḧȟḩħḥḫⱨıíìïǐĭīĩįɨḯỉȉȋịḭíìïîȷĵǰḱǩķƙᶄḳḵⱪĺŀľɫⱡļƚłḷḽḻḹꝈ","ḿṁṃńǹṅňñņɲṇṋṉñŋóòȯôöǒŏōõǫőốồɵøṓṑṍȫỗṏǿȭǭỏȍȏơổọớờỡộởợóòôöõṕṗᵽƥʀŕṙřŗɍɽȑȓṛṟṝśṡŝšṥṧṣṩşṫťƭṭʈṱṯⱦţŧúùûüǔŭūũůųűʉǘǜṹṻủȕȗưụṳứừṷṵữửựúùûüṽṿẃẁẇŵẅẉẋẍýỳẏŷÿȳỹɏỷƴỵýźżẑžƶẓẕ")),1)),MID(B1,ROW(A$1:INDEX(A:A,LEN(B1))),1)))

Solution

  • Here is a recursive LAMBDA function that works for your problem. You can define it in the Name Manager for example, or use Advanced Formula Environment Add-ins (formula 1):

    CLEANCHARS = LAMBDA(txt,old,new, IF(old <>"", 
     CLEANCHARS(SUBSTITUTE(txt,LEFT(old),new),RIGHT(old,LEN(old)-1),new),txt))
    

    Now use it for your specific case, for example:

    =CLEANCHARS(A1,"áàȧäǎāãåąⱥấầắằǡǻǟẫẵảȁȃẩẳạḁậặăâ","a")
    

    or you can use REDUCE if you don't want to define a recursive function (formula 2):

    =LET(x,A1, CLEANCHARS, LAMBDA(txt,old,new,
      REDUCE(txt,SEQUENCE(LEN(old)),LAMBDA(ac,i,SUBSTITUTE(ac, MID(old,i,1),new)))),
     CLEANCHARS(x, "áàȧäǎāãåąⱥấầắằǡǻǟẫẵảȁȃẩẳạḁậặăâ", "a"))
    

    Note: Recursive functions are affected by current operand stack limit (check this link for more info). For CLEANCHARS would be 256 recursive replacements. If you consider you can reach it, then use formula 2:

    ...Furthermore, do note that the current operand stack limit in Excel is 1,024. This should be borne in mind together with calculation times, as the current recursion limit is set as 1,024 divided by (number of lambda parameters + 1).

    Here is the output (for recurse approach): output

    Now you can extend it for the general case, by doing all the clean up for an entire array. It spills out the entire array in just one call, no need to drag the formula down:

    =LET(lk, A2:B3, BYROW(D2:D3, LAMBDA(x,REDUCE(x, SEQUENCE(ROWS(lk)),LAMBDA(ac,i, 
      CLEANCHARS(ac, INDEX(lk,i,1), INDEX(lk,i,2)))))))
    

    Note: In a similar way it can be addapted for using the non-recursive approach (formula 2).

    Here is the output: general case