I have google alerts set up and receive alert emails which look like this:
I would like to parse the alert and save it in a google sheet with the format: alert_name,publication_date, title,news source
Not sure where to start any help/guidance appreciated Thanks
Although I share the opinion that @Ruben has given in the comments, I think this topic is interesting and could help other users to save and manage their Google Alerts.
Code.gs
/* Retrieve all news from googlealert source */
const threads = GmailApp.search('from:(googlealerts-noreply@google.com)')
/* Our sheet for save the news */
const sS = SpreadsheetApp.openById(SS_ID).getSheetByName('Google Alerts')
/* Control the already added answers */
let addedNews = []
try { addedNews = sS.getRange('A1:A' + sS.getLastRow()).getValues().flat() } catch (err) { }
function parseContent() {
const totalNews = []
/* Maybe add a control system for remove the threads already saved */
threads.forEach((th) => {
const msgs = th.getMessages()
msgs.forEach((msg) => {
/* Divide the content in new lines an parse the content */
const body = msg.getPlainBody().split('\n')
/* Extract the filter name eg === News - 2 new results for [python] === */
const filterName = body.slice(0, 1)[0].match(/\[(.*?)\]/)[1]
const date = msg.getDate()
/* Remove the unnecessary lines */
const cleanedBody = body.slice(1, -11)
/* Detect the news via empty new lines "\r" */
const newsIdxs = cleanedBody.reduce((pre, curr, idx) => {
curr === "\r" && pre.push(idx)
return pre
}, [])
newsIdxs.forEach((v, idx, arr) => {
if (idx == arr.length - 1) return
/* From one empty line to the nex */
const parsedNew = cleanedBody.slice(v + 1, arr[idx + 1])
/* Simply extracted from the first line */
const title = parsedNew[0].split('|')[0]
/* Last line and between <> */
const url = parsedNew[parsedNew.length - 1].match(/<(.*?)>/)[1]
/* Extracted from the URL rather than the title due variability */
const source = url.match(/url=https:\/\/(.*?)\//)[1]
totalNews.push({ title, url, date, source, filterName })
})
})
})
totalNews.forEach((nw) => {
/* Hash the object for preventing adding already present */
const id = hashCode(Object.values(nw).toString())
if (addedNews.includes(id)) return
sS.appendRow([id, ...Object.values(nw)])
})
}
/* Extracted from here https://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript */
const hashCode = s => s.split('').reduce((a, b) => { a = ((a << 5) - a) + b.charCodeAt(0); return a & a }, 0)
Results
Note 1: This script is an approximation of the problem, and has only been tested for News related alerts.
Note 2: Thanks to @DalmTo for the pseudo-code, it has helped me to approach the problem more quickly.
Note 3: The hashCode
function has been extracted from here
Note 4: I have decided to take an approach using RegExp due to the use of getPlainBody()
, but I think that in this case, using a library that allows parsing HTML with getBody()
would be easier to implement.