Search code examples
javascriptnode.jspdfaxiosfs

Reading multipart pdf from http request on Node js backend


I am using Node.js to run a backend script that performs and http request using axios. The http request returns a pdf that I would like to save in my file system. However, when I tried to do this (using fs.writeFile on a blob), the pdf was misformatted in some way such that when I opened it, it simply gave blank pages. I've looked a bit into multipart mime objects and other things that may be occurring, but I can't find anything describing how to parse this data and save on a backend script.

For the format of the file, it starts with %pdf-1.3 1 0 obj <<, then has a bunch of headers in the form of /HideToolbar: false, then has >> endobj, and then after a while has a widths array and a very long stream of presumably base-64 encoded characters.

%PDF-1.3
1 0 obj
<<
/Type /Catalog
/Pages 4 0 R
/Outlines 2 0 R
/PageMode /UseNone
/ViewerPreferences <<
/HideToolbar false
/HideMenubar false
/HideWindowUI false
/FitWindow false
/CenterWindow false
/DisplayDocTitle false
>>
>>
endobj
2 0 obj
<<
/Type /Outlines
/Count 2 /First 26 0 R /Last 27 0 R
>>
endobj
3 0 obj

[/PDF /Text /ImageC]
endobj
4 0 obj
<<
/Type /Pages
/Count 2
/Kids [14 0 R 18 0 R ]
>>
endobj
5 0 obj
<<
/Type /Font
/Subtype /TrueType
/Name /F1
/BaseFont /DDACTR+MicrosoftSansSerif
/FirstChar 30
/LastChar 255
/Widths [
0 0 265 277 354 556 556 889
666 190 333 333 389 583 277 333
277 277 556 556 556 556 556 556
556 556 556 556 277 277 583 583
583 556 1015 666 666 722 722 666
610 777 722 277 500 666 556 833
722 777 666 777 722 666 610 722
666 943 666 666 610 277 277 277
469 551 333 556 556 500 556 556
277 556 556 228 228 500 228 833
556 556 556 556 333 500 277 556
500 722 500 500 500 333 259 333
583 0 556 0 277 556 391 565
556 556 333 1000 666 333 1000 0
610 0 0 222 222 333 333 350
292 585 333 683 500 333 943 0
500 666 265 277 556 556 556 556
259 556 333 736 370 556 583 0
736 500 399 583 333 333 333 556
537 277 333 333 365 556 833 833
833 556 666 666 666 666 666 666
1000 722 666 666 666 666 277 277
277 277 722 722 777 777 777 777
777 583 777 722 722 722 722 666
666 610 556 556 556 556 556 556
889 500 556 556 556 556 228 228
228 228 556 556 556 556 556 556
556 583 556 556 556 556 556 500
556 500  ]
/Encoding /WinAnsiEncoding
/FontDescriptor 6 0 R
>>
endobj
6 0 obj
<<
/Type /FontDescriptor
/FontName /DDACTR+MicrosoftSansSerif
/Flags 32
/FontBBox [ -580 -257 1473 1003 ]
/ItalicAngle 0
/CapHeight 500
/Ascent 728
/Descent -210
/StemV 0
/XHeight 519
/FontFile2 7 0 R
>>
endobj
7 0 obj
<< /Filter /FlateDecode /Length 15837 /Length1 45884 >>
stream
x���y`TE�7|����[�鄬4!    M

and it keeps going with non-human-readable characters for a while, and then the entire format repeats again. Does anyone know how to read this into a pdf?


Solution

  • Try to use this instead

    axios({
        method: "get",
        url: "YOUR_URL_HERE",
        responseType: "stream"
    }).then(function (response) {
        response.data.pipe(fs.createWriteStream("my.pdf"));
    });
    

    If this doesen't work, it means the pdf you are trying to download is corrupt/old