Search code examples
javabase64encodejava-io

base64 decoded file is not equal to the original unencoded file


I have a normal pdf file A.pdf , a third party encodes the file in base64 and sends it to me in a webservice as a long string (i have no control on the third party).

My problem is that when i decode the string with java org.apache.commons.codec.binary.Base64 and right the output to a file called B.pdf I expect B.pdf to be identical to A.pdf, but B.pdf turns out a little different then A.pdf. As a result B.pdf is not recognized as a valid pdf by acrobat.

Does base64 have different types of encoding\charset mechanisms? can i detect how the string I received is encoded so that B.pdf=A.pdf ?

EDIT- this is the file I want to decode, after decoding it should open as a pdf

my encoded file


this is the header of the files opened in notepad++

**A.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0 obj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /Type /Catalog

  **B.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0! bj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /]
        pe /Catalog

this is how I decode the string

private static void decodeStringToFile(String encodedInputStr,
            String outputFileName) throws IOException {
        BufferedReader in = null;
        BufferedOutputStream out = null;
        try {
            in = new BufferedReader(new StringReader(encodedInputStr));
        out = new BufferedOutputStream(new FileOutputStream(outputFileName));
            decodeStream(in, out);
            out.flush();
        } finally {
            if (in != null)
                in.close();
            if (out != null)
                out.close();
        }
    }

    private static void decodeStream(BufferedReader in, OutputStream out)
            throws IOException {
        while (true) {
            String s = in.readLine();
            if (s == null)
                break;
            //System.out.println(s);
            byte[] buf = Base64.decodeBase64(s);
            out.write(buf);
        }

    }

Solution

    1. You are breaking your decoding by working line-by-line. Base64 decoders simply ignore whitespace, which means that a byte in the original content could very well be broken into two Base64 text lines. You should concatenate all the lines together and decode the file in one go.

    2. Prefer using byte[] rather than String when supplying content to the Base64 class methods. String implies character set encoding, which may not do what you want.