Search code examples
regexgoogle-sheetsgoogle-sheets-formulaarray-formulastextjoin

Find value cells contains diacritics and filter in one column


I import personal names into my spreadsheet and each import I get different names. To match past save names and find out if any names already exist in this new import, They helped me create a way to remove diacritics from words and save in column E. Example → (Remove all diacritics without script with a single formula for use in all cells without making calls to Google API)

The names are registered only with the alphabet of the common letters of a Brazilian keyboard.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

But when I import this data, Swedish names for example come with various accents and strange letters. I would like to create a formula in Column B that would find all the names in Column A that had any letters other than the common alphabet on my keyboard. If it exists, it appears, if not, it does not appear.

Like this:

Fabio        João
Lucas        Caçandrà
João         Sølskjaer
Caçandrà     Alemão
Sølskjaer
Alemão

It would be even better if only the different letter appeared, like this:

Fabio        ã
Lucas        çà
João         ø
Caçandra     ã
Sølskjaer
Alemão

And if there is more than one different letter in each name, not appearing the two diacriticts in one cell, always appearing in a cell below, like this:

Fabio        ã
Lucas        ç
João         à
Caçandra     ø
Sølskjaer    ã
Alemão

I believe the use would be through a FILTER function, but as I need the reverse (words containing different values from the common alphabet, I can't assemble).

This way, I can create a script that whenever I find a different name on import, it will register it in a Column J so I can see and register this new diacrict in my list, like this:

for (var i = 1; i < 100; i++) {

  var cel = "B"+ i //

  if (sheet.getRange(cel).getValue() != "")  {   

    for (var u = 1; u < 1000; u++) {
      var cel2 = "J"+ u //

      if (sheet.getRange(cel2).getValue() == "")  { 

        sheet.getRange("J"+u).setValue(sheet.getRange(cel).getValue());
        break;
      }
    }
  }
};

Solution

  • =ARRAYFORMULA(IFERROR(REGEXEXTRACT(A1:A, "[^(A-Za-z)]")))
    

    0


    =ARRAYFORMULA(TRANSPOSE(REGEXEXTRACT(
     TEXTJOIN(, 1, IFERROR(REGEXEXTRACT(A1:A, "[^(A-Za-z)]+"))), REPT("(.)", LEN(
     TEXTJOIN(, 1, IFERROR(REGEXEXTRACT(A1:A, "[^(A-Za-z)]+"))))))))
    

    0