Search code examples
angulartypescriptparsingpdfmultipart

Can I parse a multipart/mixed response without having to convert it into a string first?


I'm receiving a multipart/mixed response over HTTP, which contains some JSON data as well as PDFs in byte format. Since Angular cannot handle such responses naively, my current approach is to convert the response into a string using the responseType: 'text' option.

I then take the response apart, parse the JSON, and put the PDF data into a blob like this:

let pdf: Blob = new Blob([new TextEncoder().encode(bytestring)], { type: 'application/pdf' });

However, when I want to create a download link for the PDF with window.URL.createObjectURL(pdf), the downloaded PDF is damaged and can't be opened.

I have confirmed that when Angular turns the response into a string, it uses UTF-8 encoding. I also implemented a separate route so I can request a single PDF on its own, allowing me to use responseType: 'blob', which works and downloads a functioning PDF. Furthermore, I forced VS Code to open both the original PDF file as well as the damaged one and the representation of the bytes is identical.

Since I'm able to transfer a functioning PDF when not sending it as part of a multipart request, it seems to me that the only possible cause for the broken PDF is the way I parse the multipart request, and I have read elsewhere that converting a PDF into a string and then back into a PDF can be problematic. So, is there any way to do this without converting it into a string?


Solution

  • I have found the solution. The trick is to use responseType: 'blob' for the entire response and to then turn the entire thing both into text and into bytes. You can then use the text representation to parse the JSON data as well as the PDF headers and the byte representation to build the PDF files themselves. Below I have my working Typescript code.

    public async parseMultipartResponse(multipartBody: Blob): Promise<MyMultipartResponse> {
    
        let bodyData: Uint8Array = new Uint8Array(await multipartBody.arrayBuffer());
        let bodyText: string = await multipartBody.text();
        
        // From the Content-Disposition Header of each file.
        let filenames: RegExpMatchArray = bodyText.match(/filename.*\.pdf/gi)!; 
        let boundary: string = bodyText.substring(0, bodyText.indexOf('\n')).trim();
        
        let responseDataJson: string = bodyText.split(boundary)[1];
        // Note that this only creates a plain Javascript object, not an actual instance of th class MyJsonRepresentation.
        let responseData: MyJsonRepresentation = JSON.parse(responseDataJson.substring(responseDataJson.indexOf('{')));
    
        let encoder: TextEncoder = new TextEncoder();
        let startOfFile: Uint8Array = encoder.encode("%PDF");
        let endOfFile: Uint8Array = encoder.encode("%%EOF");
    
        let pdfData: Blob;
        let filename: string;
        let pdfFiles: MyPDFFile[] = [];
        let foundStart: Boolean = false;
        let filecontentStart: number = 2 * boundary.length + responseDataJson.length;
    
        scan: for(let i = filecontentStart; i < bodyData.length - endOfFile.length; i++) {
    
            if(!foundStart) {
    
                for(let j = 0; j < startOfFile.length; j++) {
                    if(bodyData[i + j] != startOfFile[j])
                        continue scan;
                }
    
                filecontentStart = i;
                foundStart = true;
            }
    
            for(let j = 0; j < endOfFile.length; j++) {
                if(bodyData[i + j] != endOfFile[j])
                    continue scan;
            }
    
            pdfData = multipartBody.slice(filecontentStart, i + endOfFile.length, 'application/pdf');
            filename = filenames.shift()!;
    
            // I've created a class that stores the binary data together with the filename and a download link.
            pdfFiles.push(new MyPDFFile(filename.substring(filename.indexOf('"') + 1), pdfData, window.URL.createObjectURL(pdfData)));
            foundStart = false;             
        }
    
        return new MyMultipartResponse(responseData, pdfFiles);
    }