Search code examples
javascriptgoogle-apps-scriptgmailgmail-api

Parsing google alerts using gmail and app script


I have google alerts set up and receive alert emails which look like this:

screenshot showing alert

I would like to parse the alert and save it in a google sheet with the format: alert_name,publication_date, title,news source

Not sure where to start any help/guidance appreciated Thanks


Solution

  • Although I share the opinion that @Ruben has given in the comments, I think this topic is interesting and could help other users to save and manage their Google Alerts.

    Code.gs
    /* Retrieve all news from googlealert source */
    const threads = GmailApp.search('from:([email protected])')
    /* Our sheet for save the news */
    const sS = SpreadsheetApp.openById(SS_ID).getSheetByName('Google Alerts')
    /* Control the already added answers */
    let addedNews = []
    try { addedNews = sS.getRange('A1:A' + sS.getLastRow()).getValues().flat() } catch (err) { }
    
    function parseContent() {
      const totalNews = []
      /* Maybe add a control system for remove the threads already saved */
      threads.forEach((th) => {
        const msgs = th.getMessages()
        msgs.forEach((msg) => {
          /* Divide the content in new lines an parse the content */
          const body = msg.getPlainBody().split('\n')
          /* Extract the filter name eg === News - 2 new results for [python] === */
          const filterName = body.slice(0, 1)[0].match(/\[(.*?)\]/)[1]
          const date = msg.getDate()
          /* Remove the unnecessary lines  */
          const cleanedBody = body.slice(1, -11)
          /* Detect the news via empty new lines "\r" */
          const newsIdxs = cleanedBody.reduce((pre, curr, idx) => {
            curr === "\r" && pre.push(idx)
            return pre
          }, [])
          newsIdxs.forEach((v, idx, arr) => {
            if (idx == arr.length - 1) return
            /* From one empty line to the nex */
            const parsedNew = cleanedBody.slice(v + 1, arr[idx + 1])
            /* Simply extracted from the first line */
            const title = parsedNew[0].split('|')[0]
            /* Last line and between <> */
            const url = parsedNew[parsedNew.length - 1].match(/<(.*?)>/)[1]
            /* Extracted from the URL rather than the title due variability  */
            const source = url.match(/url=https:\/\/(.*?)\//)[1]
            totalNews.push({ title, url, date, source, filterName })
          })
        })
      })
      totalNews.forEach((nw) => {
        /* Hash the object for preventing adding already present  */
        const id = hashCode(Object.values(nw).toString())
        if (addedNews.includes(id)) return
        sS.appendRow([id, ...Object.values(nw)])
      })
    }
    
    /* Extracted from here https://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript */
    const hashCode = s => s.split('').reduce((a, b) => { a = ((a << 5) - a) + b.charCodeAt(0); return a & a }, 0)
    
    Results

    Results


    Note 1: This script is an approximation of the problem, and has only been tested for News related alerts.

    Note 2: Thanks to @DalmTo for the pseudo-code, it has helped me to approach the problem more quickly.

    Note 3: The hashCode function has been extracted from here

    Note 4: I have decided to take an approach using RegExp due to the use of getPlainBody(), but I think that in this case, using a library that allows parsing HTML with getBody() would be easier to implement.