Search code examples
javaandroidencryptionchecksumfile-format

Proper handling of own file format in Java


I have to create an custom file format (byte-based) for an Android-Application. The formats main purpose is to save an AES encrypted file (byte data), and some metadata, which is needed to decrypt it (such as IV, Salt and several application settings).

I have several questions on how to design and implement this:

  1. What are mandatory fields in the file?

The current idea is to start off with 4 Bytes of a Magic Number, then Version Number of the format. This is followed by the IV and Salt. Then i would include a checksum of the first 4kb of the original (unencrypted) data, so i can quickly decrypt just the first 4kb and check if the key provided was the correct one. Then a checksum of the whole original (unencrypted) data so i can check the whole file too. This is it for the header. (Do i need length of ((un)encrypted) data? Offset to data? A checksum for the whole (header + body) file?)

For the body (which is now encrypted) i would like to add the original file name and the extension (how much bytes should be used for this?). Then the original file.

  1. What is the best way to read/write such byte based files in Java?

The two main methods i found are ByteArrayOutputStreams and RandomAccessFiles. With the first option i am missing the seek option, like how is it possible to write at a specific position (i.e. for the checksum)? The second one seems to work well, but maybe there are better solutions available.


Solution

  • For my H2 database, I implemented a file system abstraction, with multiple filesystem implementations, including an encrypted file. There are many other file system implementations, for example a cache wrapper, and so on.

    I would use XTS (XEX-based tweaked-codebook mode with ciphertext stealing), this is what I implemented. It allows random access reads and writes, and is not much slower than pure AES.

    The header you suggest sounds good to me: magic number, then version number of the format. I combined magic number and version number (a different version results in a different magic number). With XTS, an IV is not needed. Salt, I would use plenty, for example 8 bytes. I also stored the hash iterations to hash the password, I used PBKDF2 for that. I think using something like PBKDF2 is important.

    I made the header 4096 bytes long, to match the block size of regular file systems. This should improve performance if you read and write with a fixed block size. I didn't use any checksum, as my underlying (unencrypted) file has a checksum. I think that's good enough, and maybe a bit more secure than to store an unencrypted checksum of the unencrypted data, but I'm not sure.

    As for the API, using ByteBuffer or byte[] is both fine. With ByteBuffer, supporting memory mapped files is simpler.