Search code examples
regexgoogle-apps-scriptgmail

Google Apps Script / Regular expression to show only latest message in an email "train"


email train = entire contents of a single email including previous messages

email thread = a Google Apps Script array of messages

QUESTION: In Google Apps Script, how can I import only the latest email from within an email train?

I have a simple Google Apps Script based on the GmailApp class. Eventually, this will import all emails TO and FROM a given address (using a search query) into a spreadsheet:

  var threads = GmailApp.search('from: [email protected] OR to: [email protected]');
  Logger.log("Thread count: " + threads.length);
  for (var i = 0; i < threads.length; i++) {
    Logger.log("Subject:" + threads[i].getFirstMessageSubject());
    Logger.log("ID:" + threads[i].getId());
    var messages = threads[i].getMessages();
    for (var j = 0; j < messages.length; j++) {
      Logger.log(messages[j].getPlainBody());
    }
  }

The problem with email in general is that when someone replies to an email, it includes the original text. After 5 or 6 replies, each individual message gets very long, and when importing all messages within a Gmail thread as above, there's a tonne of duplication.

The gmail.com web interface gets around this problem with the use of the ellipsis button:

Gmail ellipsis button

How do I replicate this black magic?

I understand any given solution will be non-perfect.

My first thought is to use some kind of regular expression, but I don't know where to start.


Solution

  • Indeed, a regular expression can offer a non-perfect solution to this problem. For U.S. standard of time and date, the following matches the Gmail datetime string that precedes a quoted message:

    var prev = /On (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat), (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, \d{4} at \d{1,2}:\d{2} [AP]M,/;
    

    Then in the loop, one can truncate each message starting at such datetime.

    for (var j = 0; j < messages.length; j++) {
      var text = messages[j].getPlainBody();
      var match = text.match(prev);
      if (match) {
        text = text.slice(0, match.index);
      }
      Logger.log(text);
    }
    

    This will fail if some participants use different locale settings for Gmail. Of course one can try to adjust this, e.g., by making day-of-week and [AP]M optional.