Search code examples
sqloracleblobfiletableutl-file

Exporting files from Oracle BLOB to SQL FileTable - file types?


I am trying to do an ETL for a data conversion/upgrade of an Oracle table that contains HUGEBLOB files, and the columns in that table list only a file name and the MIME type. The file names don't always (in a lot of cases in fact) provide the extension of the file type, and for the ones that don't, with a MIME type of "application/octet-stream", I cannot determine easily what the original file type was and translate it give it the proper file extension. It's my understanding that "application/octet-stream" is just a basic binary file type. However, many of them tagged as such are actually doc files, text files, pdf's, etc. but just didn't get uploaded with their proper MIME type into the database for whatever reason (a really poorly coded system!). So, I'm trying to figure out if there is somewhere within the binary code of the files that I can determine/extract the actual file type/extension--and if so--how? This file type extension is going to be needed when it's uploaded to SQL FileTable or else the new system won't know what type of file it is either!


Solution

  • Generally, it's possible to figure out the file type by looking at the first few bytes of the file. Of course, that gets pretty tedious as the number of possible file formats increases if you were to try to implement it yourself. Most likely, you'd want to leverage an application or framework that already supported a large number of file formats to try to figure out at least the vast majority of files.

    Personally, I'd probably start with something like Apache Tika to do automated file recognition. That would involve writing a bit of Java code either as a stored procedure in the database or as a standalone program running outside the database. If you're more comfortable using some other language, you can probably find a similar API for that language. You could also use an existing standalone application to determine the file type but that would almost certainly require saving the file from the database to the operating system and would probably be more difficult to integrate with your ETL tool than a custom-built piece of code.