Search code examples
regexgoogle-apps-scriptgoogle-docsre2

Word count with highlighting (excluding the occurrence of words in other words) in google doc


I am writing a script for a Google document that counts words and highlights them. The script works, but not quite as it should. Parts of words should not be counted and highlighted. For example, I am looking for the word cop, if there is a word robocop - skip it.

I tried regular expression with the word "me", but seems it doesn't fit, as I need to go through the text, highlighting words along the way. But maybe I just don’t understand how to do it right.

function findWords2(keys) {
  var body = doc.getBody();
  var keysMap = {}; // object for keys with quantity

  // For every word in keys:
  for (var w = 0; w < keys.length; ++w) {
    // Get the current word:
    //var rx = /(.){1}me(.){1}/;
    //var foundElement = rx.exec(doc.getBody().getText()); 
    //var foundElement = body.findText(rx);

    var foundElement = body.findText(keys[w]);
    var count = 0;

    while (foundElement != null) {
      // Get the text object from the element
      var foundText = foundElement.getElement().asText();

      count++;

      // Where in the Element is the found text?
      var start = foundElement.getStartOffset();
      var end = foundElement.getEndOffsetInclusive();

      // Change the background color to yellow
      foundText.setBackgroundColor(start, end, "#FCFC00");

      // Find the next match
      foundElement = body.findText(keys[w], foundElement);
    }
    keysMap[keys[w]] = count; // add current searched keyword to keysMap with quantity
  }

  return JSON.stringify(keysMap, null, 1);
}

So, if we call findWords('cop') in text "Robocop cop cop", we found and highlighted cop 3 times, instead of two. In theory, I just need to check the previous and subsequent characters of the found word, but how to do it?


Solution

  • You should use word boundary\b:

    \bcop\b
    

    Note that body.findText() receives regex as string. So, You should escape \:

    body.findText("\\bcop\\b")
    

    If you're searching plain string, (using regexp.exec),

    /\bcop\b/g
    

    References: