Search code examples
file-format

How do I research obscure file types?


A client has a large document-management system -- millions of TIFFs and PDFs and a fewer of other random files; images and other binaries. I'm converting formats, imprinting notes, reorganizing and redacting sensitive information when found. And that's all great for the vast bulk of the files.

But I occasionally find a new format and have to figure out what it is and how to handle it within the project's parameters. Usually this isn't too hard and when it has been, it's such a small handful that it doesn't matter too much if I just can't handle it. But right now, I have a larger handful of files that don't appear to have a sophisticated header but all start with "COM1.0" (43 4F 4D 31 2E 30).

So, I'd like help on two levels. What's a good way for me to research this (and others I might find in the future -- teach a man to fish, and all); when just Googling around fails me? And if you know what the file type is, I'd be keen to hear about it.


Solution

  • One specialist site is http://www.wotsit.org/ - there may be a few others. These give details when you can already identify the file format, though.

    There are some more tips at http://www.garykessler.net/library/file_sigs.html

    I did try doing a little searching and didn't turn up anything much, but I didn't try very hard.