Files are categorized by file-extension. So my question is, how to identify the file type even the file extension has been changed.
For example, i have a video file with name myVideo.mp4
, i have changed it to myVideo.txt
. So if i double-click it, the preferred text editor will open the file, and won't open the exact content. But, if i play myVideo.txt
in a video player, the video will be played without any problem.
I was just thinking of developing an application to determine the type of file without checking the file-extension and suggesting the software for opening the file. I would like to develop the application in Java.
Structure, magic numbers, metadata, strings and regular expressions, heuristics and statistical analysis... the tool will only be as good as the database of rules behind it.
Try DROID (Digital Record Object IDentification tool) for identifying file types; Java, Net BSD-licensed. It is a free project of the National Archives UK, unrelated to Android. Source is available on Github and Sourceforge. The DROID documentation is good, there's also a getting started guide from the Digital Preservation Coalition.
See also Darwinsys file and libmagic.