Search code examples
sql-serverblobfileserver

Considerations where to store documents - on file server or in DB?


I have a design decision to make regarding documents uploaded to my web site: I can either store them on my file server somewhere, or I can store them as a blob in my database (MSSQL 2005). If it makes any difference to the design decision, these documents are confidential and must have a certain degree of protection.

The considerations I've thought of are:

  1. Storing on the file server makes for HUUUUUUUGE numbers of files all dumped in a single directory, and therefore slower access, unless I can work out a reasonable semantic definition for a directory tree structure
  2. OTOH, I'm guessing that the file server can handle compression somewhat better than the DB... or am I wrong?
  3. My instincts tell me that the DB's security is stronger than the file server's, but I'm not sure if that's necessarily true.
  4. Don't know how having terabytes of blobs in my DB will affect performance.

I'd very much appreciate some recommendations here. Thanks!


Solution

  • In SQL Server 2005, you only have the choice of using VARBINARY(MAX) to store the files inside the database table, or then keep them outside.

    The obvious drawback of leaving them outside the database is that the database can't really control what happens to them; they could be moved, renamed, deleted.....

    SQL Server 2008 introduces the FILESTERAM attribute on VARBINARY(MAX) types, which allows you to leave the files outside the database table, but still under transactional control of the database - e.g. you cannot just delete the files from the disk, the files are integral part of the database and thus get copied and backed up with it. Great if you need it, but it could make for some huge backups! :-)

    The SQL Server 2008 launch presented some "best practices" as to when to store stuff in the database directly, and when to use FILESTREAM. These are:

    • if the files are typically less than 256 KB in size, the database table is the best option
    • if the files are typically over 1 MB in size, or could be more than 2 GB in size, then FILESTREAM (or in your case: plain old filesystem) is your best choice
    • no recommendation for files between those two margins

    Also, in order not to negatively impact performance of your queries, it's often a good idea to put the large files into a separate table alltogether - don't have the huge blobs be part of your regular tables which you query - but rather create a separate table, which you only ever query against, if you really need the megabytes of documents or images.

    So that might give you an idea of where to start out from!