Search code examples
executablebinaryfilessignatureexecutable-format

Magic value collision between MachO fat binaries and Java class files


Both Java .class files and Mach-O fat binaries have the same magic signature, 0xCAFEBABE. When reading binary files, what's a good way to disambiguate?


Solution

  • Here's Apple take on this: https://opensource.apple.com/source/file/file-80.40.2/file/magic/Magdir/cafebabe.auto.html

    Since Java bytecode and Mach-O universal binaries have the same magic number, the test must be performed in the same "magic" sequence to get both right. The long at offset 4 in a mach-O universal binary tells the number of architectures; the short at offset 4 in a Java bytecode file is the JVM minor version and the short at offset 6 is the JVM major version. Since there are only only 18 labeled Mach-O architectures at current, and the first released Java class format was version 43.0, we can safely choose any number between 18 and 39 to test the number of architectures against (and use as a hack). Let's not use 18, because the Mach-O people might add another one or two as time goes by...

    Here's a visual comparison: enter image description here

    So it seems currently Apple does the

    (int32 at offset4) < 0x20 
    

    check to interpret it as fat Mach-o in their tools like command line file for determining file types.
    Since even Apple calls it a hack unfortunately there's no 100% reliable method, but it seems that's your best shot. You can always be dealing with malformed (deliberately or not) fat Mach-o files or Java class files which may or may not be relevant for your use case.