Search code examples
javascriptgoogle-cloud-functionspdf.jsdropbox-sdk-js

How to use pdfjs in a google cloud function to accept a pdf read from Dropbox as a fileBinary?


Context

I am using Dropbox and PDFJs library inside a Google Cloud Function

What I'm doing

Inside my functions folder i run

npm i --save pdfjs-dist

Then I download a pdf content from dropbox (this works)

exports.readAProgram = functions.https.onRequest(async(req, res) => {
    var dbx = new Dropbox.Dropbox({ accessToken: ACCESS_TOKEN });
    dbx.filesDownload({ path: "/full/path/20220702.pdf" })
        .then(function(response) {
            console.log('response', response)
            res.json(response.result.fileBinary);
        })
        .catch(function(error) {
            // console.error(error);
            res.json({"error-1": error})
        });
});

I got this

enter image description here

Formatted is this

enter image description here

Note

I do not known what exactly is a fileBinary because

Next step: pass data to PDF.js.getDocument

I'm looking at the sourcecode, because obviously official api doc is useless.

See here: https://github.com/mozilla/pdf.js/blob/master/src/display/api.js#L232

The getDocument function accepts

string|URL|TypedArray|PDFDataRangeTransport|DocumentInitParameters

Question

How can I convert my Dropbox fileBinary structure into something accettable from PDFJS.getDocument ?

I tried

dbx.filesDownload({ path: "/full/path/20220702.pdf" })
    .then(function(response) {

        var loadingTask = PDFJS.getDocument(response.result.fileBinary)
            .then(function(pdf) {
                console.log ("OK !!!!")
                res.json(response.result.fileBinary);
            })
            .catch(function (error) {
                console.log ("error)
                res.json({"error_2": error})
            });

But I got this on console

>  C:\laragon\www\test-pdf-dropbox\functions\node_modules\pdfjs-dist\build\pdf.js:2240
>        data: structuredClone(obj, transfers)
>              ^
>  
>  ReferenceError: structuredClone is not defined
>      at LoopbackPort.postMessage (C:\laragon\www\test-pdf-dropbox\functions\node_modules\pdfjs-dist\build\pdf.js:2240:13)
>      at MessageHandler.sendWithPromise (C:\laragon\www\test-pdf-dropbox\functions\node_modules\pdfjs-dist\build\pdf.js:8555:19)
>      at _fetchDocument (C:\laragon\www\test-pdf-dropbox\functions\node_modules\pdfjs-dist\build\pdf.js:1356:48)
>      at C:\laragon\www\test-pdf-dropbox\functions\node_modules\pdfjs-dist\build\pdf.js:1302:29
>      at processTicksAndRejections (node:internal/process/task_queues:96:5)

Solution

  • i solved

    First: use legacy dist of PDFJS

    instead of using

    const PDFJS = require("pdfjs-dist");
    

    I do now

    const PDFJS = require("pdfjs-dist/legacy/build/pdf.js");
    

    the npm package is the same, pdfjs-dist

    Then: using PDFJS in this way

    var pdf =  PDFJS.getDocument(new Uint8Array(response.result.fileBinary)).promise
                .then(function(pdf) {
                    console.log ("Letto il PDF !!!!", pdf)
                    res.json({done: true})
                })
    

    Note

    • fileBinary can be passed to PDFJS using new Uint8Array
    • i appended .promise before of .then