Is there a way to identify or inspect an AES encrypted file based on the file content (like the way a ZIP file can be identified by looking for letters "PK" at the beginning of the file)? Is there any magic number associated with AES encrypted files?
We have multiple files in the workflow repository that are either in plain text (could be excel, XML, JSON, text etc.) or AES-256 encrypted and don't have an idea which ones are AES encrypted. I need to write Java code to identify the AES encrypted files and decrypt them automatically. Thanks!
In the absence of any standard header, you could look at the byte frequency. AES encrypted data (or indeed anything encrypted with a decent algorithm) will appear to be a random sequence of bytes. This means that the distribution of byte values 0-255 will be approximately flat (i.e. all byte values are equally likely).
However, textual documents will mostly contain printable characters - some much more than others. Spaces, newlines, vowels etc will be disproportionately common.
So, you could build histograms of byte counts for your various files, and look for a simple way to classify them into encrypted or not-encrypted. For example, look at the ratio of the total count of the 5 least common byte values and the total count of the 5 most common byte values. I would expect this ratio to be close to 1.0 for an encrypted file, and quite far from 1.0 for a normal textual document (I'm sure there are much more sophisticated statistical metrics that could be used...).
This might not work so well for extremely short documents, of course.
See also: