Search code examples
angularjspdfutf-8blobangularjs-http

Non UTF-8 characters in PDF Javascript Blob


I have a PDF file that I serve from a WebApi 2 application to an AngularJS client. I use file-saver to then save the file on the client as follows (in TypeScript):

   this.$http.get(`${webUrl}api/pdf?id=${fileDto.id}`)
    .then((response: ng.IHttpPromiseCallbackArg<any>) => {
        var file = new Blob([response.data], { type: 'application/pdf' });
        saveAs(file, 'my.pdf');
    });

The reason I do this is so that I can use a bearer token to authorize access to the PDF (this is added via an interceptor). This works except for when the PDF file contains non UTF8 characters. In this latter case the file still downloads, but when I open it, it appears blank. Opening the file I can see that the non UTF8 characters are replaced with a □ character. In the JavaScript when I inspect the string value of response.data in the debugger I see that those characters are represented by �. Am I right in assuming that, since the file has been written from a string in JavaScript, no matter what I do I will not be able to correctly save a file with non UTF8 characters from JavaScript?


Solution

  • The character is the Unicode Replacement Character \uFFFD which is inserted by the UTF-8 parser when it tries to parse illegal UTF-8.

    PDF files are not UTF-8 strings; they are binary files.

    To avoid the conversion from UTF-8 to DOMstring (UTF-16), set the config to responseType: 'blob':

       var config = {responseType: 'blob'};
    
       this.$http.get(`${webUrl}api/pdf?id=${fileDto.id}`, config)
         .then((response: ng.IHttpPromiseCallbackArg<any>) => {
           ̶v̶a̶r̶ ̶f̶i̶l̶e̶ ̶=̶ ̶n̶e̶w̶ ̶B̶l̶o̶b̶(̶[̶r̶e̶s̶p̶o̶n̶s̶e̶.̶d̶a̶t̶a̶]̶,̶ ̶{̶ ̶t̶y̶p̶e̶:̶ ̶'̶a̶p̶p̶l̶i̶c̶a̶t̶i̶o̶n̶/̶p̶d̶f̶'̶ ̶}̶)̶;            
           var file = response.data;
           saveAs(file, 'my.pdf');
       });
    

    For more information, see