Search code examples
javabytefileinputstream

Reading from one byte file returns 0xEF 0xBF 0xBD


Solved, accepted answer below because it pointed me at checking how my file got corrupted. Please read the end of this question for the Maven reason.

I created a 1-byte file containing the byte 0xA8. I'm trying to read it into any Java structure that will allow me to work with it later. I know bytes in java are signed so any value from 0x80 through 0xFF will be interpreted negative.

import java.io.DataInputStream;
import java.io.EOFException;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

public class Test {
    public static void main(final String[] args) throws IOException {
        // 0xA8.hex contains one byte 0xA8.
        final File inputFile = new File(Test.class.getClassLoader().getResource("0xA8.hex").getPath());
        final FileInputStream fin = new FileInputStream(inputFile);
        final DataInputStream dis = new DataInputStream(fin);
        int read = dis.readUnsignedByte();
        System.out.println(read + ", hex: " + Integer.toHexString(read));

        while (true) {
            try {
                read = dis.readUnsignedByte();
                System.out.println("read more: " + read + ", hex: " + Integer.toHexString(read));
            } catch (final EOFException ignored) {
                break;
            }
        }
    }
}

There probably is something really simple I'm missing, but I can't wrap my head around it.. The program above outputs:

239, hex: ef
read more: 191, hex: bf
read more: 189, hex: bd

The 0xA8.hex file is a 1-byte file I created myself using a hex editor. The contents of it really is one single byte:

drvdijk@macmine:~/$ hexdump 0xA8.hex 
0000000 a8                                             
0000001

Why, and how can I make it read just one (possibly cast to unsigned) byte?

Solution

I use Maven, and in the pom.xml I had a section:

<build>
    <resources>
        <resource>
            <directory>src/main/resources</directory>
            <filtering>true</filtering>
        </resource>
    </resources>
    <!-- ... -->
</build>

The 0xA8.hex file I was using got filtered by Maven, entering the replacement character (0xEF 0xBF 0xBD) where my 0xA8 used to be. I now updated the pom.xml to the following:

<build>
    <resources>
        <resource>
            <directory>src/main/resources</directory>
            <filtering>true</filtering>
            <excludes>
                <exclude>**/*.hex</exclude>
            </excludes>
        </resource>
        <resource>
            <directory>src/main/resources</directory>
            <filtering>false</filtering>
            <includes>
                <include>**/*.hex</include>
            </includes>
        </resource>
    </resources>
    <!-- ... -->
</build>

Solution

  • It seems your file is having replacement character http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280&utf8=dec

    U+FFFD  �   239 191 189 REPLACEMENT CHARACTER
    

    Not the 0xA8 , i did created one simple file using the following code

    File f = new File("0xA8.hex");
            FileOutputStream stream = new FileOutputStream(f);
            stream.write(0xA8);
            stream.flush();
            stream.close();
    

    Note** just for demo

    And used your program to read it is working as excepted. Find how the file corrupted .