I'm trying to find all mentions of a Google Sheets' cell value on some website page. The cell value can be written in any language. The website page can also be in any language.
Now I have this formula:
=COUNTA(IFERROR(IMPORTXML(A2;"//*[contains(translate(text(),'ABCDEFGHJIKLMNOPQRSTUVWXYZАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЭЮЯ', 'abcdefghjiklmnopqrstuvwxyzабвгдеёжзиклмнопрстуфхцчшщэюя'),'"&LOWER(B2)&"')]")))
But it works incorrectly. It's not possible to find all words in Latin and Cyrillic, characters from Portuguese, German and other languages are not taken into account. How to make the formula universal? Or how can I write the appropriate script for Google Sheets?
My formula works incorrectly. I want to find any text in any language on website page with Google Sheets' formula or script.
We could solve this more elegantly if Google Docs would support Xpath 2.0 and we could use the fn:lower-case()
. In Xpath 1.0 we are stuck with translate()
and need to provide the upper-case and lower-case translation ourselves. However, it is difficult to cover all possible (Unicode) characters in all languages this way.1 I have added the diacritics from the Latin alphabet to your formula, which should cover most Western languages.
=COUNTA(IFERROR(IMPORTXML(A2;"//*[contains(translate(text(),
'ABCDEFGHJIKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞŸŽŠŒАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЭЮЯ',
'abcdefghjiklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿžšœабвгдеёжзиклмнопрстуфхцчшщэюя'),'"&LOWER(B2)&"')]")))
1: Furthermore, such a transformation does not work perfectly for all languages, e.g., the German 'ß' (lowercase) may be written as 'SS' when in uppercase which makes it difficult to find all matches.