Search code examples
javascriptangularregexsearchhighlight

Regexp search result hightlight with accent


Can someone help me to improve my search please ? I try to highlight several words when a user write one or many things in the input. I am using this function :

checkHighlightList(originalStr, queries) {
    const regexp = new RegExp(queries.join('|'), 'gi');
    const matchs = originalStr.match(regexp);
    if (matchs) {
      const result = originalStr.replace(regexp, match => `<span class="highlight">${ match }</span>`);
      return result;
    }
  }

The problem is, if I have the word "pokémon" in my queries and I write "kemon". It doesn't work because every accent characters are different (ô !== o). I would like to write "ke" or "ké" in my input and highlight the "ké" part in "pokémon". I use some french words who contains a lot of accent on it. Thank you


Solution

  • To search accented text without accent in the search terms you can define an accent map, and compose a regex from that:

    accentMap = {
      ae: '(ae|æ|ǽ|ǣ)',
      a:  '(a|á|ă|ắ|ặ|ằ|ẳ|ẵ|ǎ|â|ấ|ậ|ầ|ẩ|ẫ|ä|ǟ|ȧ|ǡ|ạ|ȁ|à|ả|ȃ|ā|ą|ᶏ|ẚ|å|ǻ|ḁ|ⱥ|ã)',
      c:  '(c|ć|č|ç|ḉ|ĉ|ɕ|ċ|ƈ|ȼ)',
      e:  '(e|é|ĕ|ě|ȩ|ḝ|ê|ế|ệ|ề|ể|ễ|ḙ|ë|ė|ẹ|ȅ|è|ẻ|ȇ|ē|ḗ|ḕ|ⱸ|ę|ᶒ|ɇ|ẽ|ḛ)',
      i:  '(i|í|ĭ|ǐ|î|ï|ḯ|ị|ȉ|ì|ỉ|ȋ|ī|į|ᶖ|ɨ|ĩ|ḭ)',
      n:  '(n|ń|ň|ņ|ṋ|ȵ|ṅ|ṇ|ǹ|ɲ|ṉ|ƞ|ᵰ|ᶇ|ɳ|ñ)',
      o:  '(o|ó|ŏ|ǒ|ô|ố|ộ|ồ|ổ|ỗ|ö|ȫ|ȯ|ȱ|ọ|ő|ȍ|ò|ỏ|ơ|ớ|ợ|ờ|ở|ỡ|ȏ|ō|ṓ|ṑ|ǫ|ǭ|ø|ǿ|õ|ṍ|ṏ|ȭ)',
      u:  '(u|ú|ŭ|ǔ|û|ṷ|ü|ǘ|ǚ|ǜ|ǖ|ṳ|ụ|ű|ȕ|ù|ủ|ư|ứ|ự|ừ|ử|ữ|ȗ|ū|ṻ|ų|ᶙ|ů|ũ|ṹ|ṵ)'
    };
    
    function escapeRegExp(string) {
      return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
    }
    
    function checkHighlightList(str, queries) {
      accentRegex = new RegExp(Object.keys(accentMap).join('|'), 'g');
      const queryRegex = new RegExp(queries.map(q => {
        return escapeRegExp(q).toLowerCase().replace(accentRegex, m => {
          return accentMap[m] || m;
        });
      }).join('|'), 'gi');
      return str.replace(queryRegex, m => `<span class="highlight">${ m }</span>`);
    }
    
    let source = 'Pokémon & Crème Brulée';
    let result = checkHighlightList(source, [ 'kemon', 'creme' ]);
    console.log('source:\n  "' + source + '"');
    console.log('result:\n  "' + result + '"');

    Output for search terms [ 'kemon', 'creme' ]:

    source:
      "Pokémon & Crème Brulée"
    result:
      "Po<span class="highlight">kémon</span> & <span class="highlight">Crème</span> Brulée"
    

    Explanation of accentRegex:

    • it is an OR regex of all keys of accentMap:
      • example: /ae|a|c|e|i|n|o|u/g
      • tweak the map as needed for additional accent chars

    Explanation of queryRegex:

    • it is an OR regex of all query terms, where each key in accentMap gets mapped to an OR regex of all accented version of that key
      • example: query term cafe results in this (shortened) regex: /c(a|á|ă|ắ|ặ|ǎ|â|ậ|ä|ȧ|à|)f(e|é|ĕ|ě|ệ|ë|ė|è|ẽ)/gi

    Note: Since the query terms are user specified and used in a regex, we need to escape the regex symbols in the user input, hence the use of function escapeRegExp().