I am trying to write a gmail add-on where I iterate over all emails and create a report based on their producers. Iterating over emails is the easiest part and I have done that, however I can't find any way to get producer line of each PDFs. So far I tried
What's the best way to get the producer line of a PDF in google app script?
Thank you
I could understand like above. If my understanding is correct, how about this sample script? In this sample script, from your shared PDF files, the value of Producer is retrieved by 2 regular expressions from the file content. Please think of this as one of several answers.
When you use this script, please set the folder ID of folder that PDF files are put. This script retrieves the value from all PDF files in a folder.
var folderId = "### folderId ###";
var files = DriveApp.getFolderById(folderId).getFilesByType(MimeType.PDF);
var regex = [/Producer\((\w.+)\)/i, /<pdf:Producer>(\w.+)<\/pdf:Producer>/i];
var result = [];
while (files.hasNext()) {
var file = files.next();
var content = file.getBlob().getDataAsString();
var r = regex.reduce(function(s, e) {
var m = content.match(e);
if (Array.isArray(m)) s = m[1];
return s;
}, "");
result.push({
fileName: file.getName(),
fileId: file.getId(),
vaueOfProducer: r,
});
}
Logger.log(result); // Result
This sample result was retrieved from a folder (my Google Drive) that the shared 3 PDF files were put.
[
{
"fileName": "2348706469653861032.pdf",
"fileId": "###",
"vaueOfProducer": "iText� 7.1.5 �2000-2019 iText Group NV \(iText; licensed version\)"
},
{
"fileName": "Getting started with OneDrive.pdf",
"fileId": "###",
"vaueOfProducer": "Adobe PDF library 15.00"
},
{
"fileName": "DITO-Salesflow-040419-1359-46.pdf",
"fileId": "###",
"vaueOfProducer": "iText 2.1.7 by 1T3XT"
}
]
2348706469653861032.pdf
, the characters which cannot be displayed are included in the value of Producer.