Search code examples
javapdfbytedocumentum

Damaged Pdf after setting content from a server response


I am currently making rest calls to a server for signing a pdf document.

I am sending a pdf(binary content) and retrieving the binary content of the signed pdf. When i get the binary content from the inputStream:

    try (InputStream inputStream = conn.getInputStream()) {
        if (inputStream != null) {
            try (BufferedReader br = new BufferedReader(new InputStreamReader(inputStream))) {
                String lines;
                while ((lines = br.readLine()) != null) {
                    output += lines;
                }
            }
        }
    }

signedPdf.setBinaryContent(output.getBytes());

(signedPdf is a DTO with byte[] attribute) but when i try to set the content of the pdf with the content of the response pdf:

ByteArrayOutputStream out = new ByteArrayOutputStream();
out.write(signedPdf);
pdf.setContent(signedPdf);

and try to open it, it says that the pdf is damaged and cannot be repaired.

Anyone encountered something similar? Do i need to set the content-length as well for the output stream?


Solution

  • PDF is binary data. One corrupts the PDF when reading as text (which in Java is always Unicode). Also it is a waste: a byte as char would double the memory usages, and there are two conversions: from bytes to String and vice versa, using some encoding. When converting from UTF-8 even UTF-8 format errors may be raised.

    try (InputStream inputStream = conn.getInputStream()) {
        if (inputStream != null) {
            byte[] content = inputStream.readAllBytes();
            signedPdf.setBinaryContent(content);
        }
    }
    

    Whether to use a BufferedInputStream depends, for instance on the expected PDF size.

    Furthermore new String(byte[], Charset) and String.getBytes(Charset) with explicit Charset (like StandardCharsets.UTF_8) are preferable over a default Charset overloaded version. Those use the current platform encoding, and hence delivers non-portable code. Behaving differently on an other platform/computer.