Search code examples
javafile-ioinputstreamoutputstream

Java InputStream.readAllBytes() returns more than what was written


I have this code

    public static void main(String[] args) throws Exception {
        try (PythonInterpreter pi = new PythonInterpreter()) {

            String sourceString = "print 'compiled!'";
            String name = "mymodule";

            // Dress the String as bytes: compiler not v good with encodings.
            InputStream source = new InputStream() {
                ByteBuffer buf = StandardCharsets.US_ASCII.encode(sourceString);
                @Override
                public int read() throws IOException {
                    if (buf.remaining() > 0) {
                        return buf.get() & 0xff;
                    } else {
                        return -1;
                    }
                }
            };

            byte[] javaCode = imp.compileSource(name, source, "<string>");
            String className = name + "$py";

            System.out.println(String.format("Original length: %s", javaCode.length));
            
            PyCode code = BytecodeLoader.makeCode(className, javaCode, "<string>");
            System.out.println(String.format("Code: %s", code));
            pi.exec(code);

            // You can write a file and read it back if you want :)
            byte[] javaCode2 = javaCode.clone();
            
            System.out.println(String.format("Read in length: %s", javaCode2.length));
            
            if (javaCode2.length == javaCode.length){
                System.out.println(String.format("Are equal: %s", Arrays.equals(javaCode, javaCode2)));
            }

            PyCode code2 = BytecodeLoader.makeCode(className, javaCode2, "<file>");
            System.out.println(String.format("Code: %s", code2));
            pi.exec(code2);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

Which outputs

Original length: 2125
Code: <code object <module> at 0x2, file "<string>", line 0>
compiled!
Read in length: 2125
Are equal: true
Code: <code object <module> at 0x3, file "<file>", line 0>
compiled!

All well and good but, when I swap the single line javaCode.clone() for writing to then reading from a file

    public static void main(String[] args) throws Exception {
        try (PythonInterpreter pi = new PythonInterpreter()) {

            String sourceString = "print 'compiled!'";
            String name = "mymodule";

            // Dress the String as bytes: compiler not v good with encodings.
            InputStream source = new InputStream() {
                ByteBuffer buf = StandardCharsets.US_ASCII.encode(sourceString);
                @Override
                public int read() throws IOException {
                    if (buf.remaining() > 0) {
                        return buf.get() & 0xff;
                    } else {
                        return -1;
                    }
                }
            };

            byte[] javaCode = imp.compileSource(name, source, "<string>");
            String className = name + "$py";

            System.out.println(String.format("Original length: %s", javaCode.length));
            
            PyCode code = BytecodeLoader.makeCode(className, javaCode, "<string>");
            System.out.println(String.format("Code: %s", code));
            pi.exec(code);

/// CHANGE STARTS HERE
            OutputStream os =
                    Files.newOutputStream(Paths.get("compiled_code"), StandardOpenOption.CREATE);
            System.out.println(String.format("os: %s", os));
            os.write(javaCode);
            os.close();

            InputStream is =
                    Files.newInputStream(Paths.get("compiled_code"), StandardOpenOption.READ);
            System.out.println(String.format("is: %s", is));
            byte[] javaCode2 = is.readAllBytes();
            is.close();
/// CHANGE ENDS HERE
            
            System.out.println(String.format("Read in length: %s", javaCode2.length));
            
            if (javaCode2.length == javaCode.length){
                System.out.println(String.format("Are equal: %s", Arrays.equals(javaCode, javaCode2)));
            }

            PyCode code2 = BytecodeLoader.makeCode(className, javaCode2, "<file>");
            System.out.println(String.format("Code: %s", code2));
            pi.exec(code2);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

The output becomes

Original length: 2125
Code: <code object <module> at 0x2, file "<string>", line 0>
compiled!
os: java.nio.channels.Channels$1@3533df16
is: sun.nio.ch.ChannelInputStream@14ac77b9
Read in length: 2152
Exception in thread "main" java.lang.ClassFormatError: Extra bytes at the end of class file mymodule$py

My question is, why did readAllBytes() give a byte array of length 2152 (27 bytes extra than what was given to os.write())?

I checked and all the bytes in javaCode2 are correct, as long as you ignore the extra 27.

The extra are 0 88 0 1 0 89 73 0 90 0 91 0 1 0 89 74 0 92 0 94 0 1 0 89 115 0 95


Solution

  • You may be overwriting an existing file, change to use truncate as well which will reset the file size to zero:

    Path path = Paths.get("compiled_code");
    OutputStream os = Files.newOutputStream(path, StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING);
    

    However you are much better off using built in calls of Files for byte[] read and writes - saves several lines of code and you can omit try-with-resources stream handling or close() operations:

    Files.write(path, javaCode);
    

    and

    byte[] javaCode2 = Files.readAllBytes(path);