Search code examples
gitproject-managementcvs

Managing documents using GIT


I am working on a website where I will be able to create project and upload data to each of my products. The data could be mostly in the form of spreadsheet docs, images, pdfs etc. Ideally, I would like to use a VCS (git pref) kind of setup where each time I update a particular document, I could just commit that document to a repo. Any ideas on how I could go about implementing will be helpful.


Solution

  • You can call git in a subshell after each upload.

    But I don't think using any VCS it's good solution for document versioning, especially in web application. This is because with office-like documents you will use mostly binary data. VCS sucks (no exceptions) when comes to binary data. You will not be able to do any diff, and metadata management is not suited for such things - author of commit is mostly bounded to particular account (and you will be using probably one system account for git), no additional information (except base file information: size, permissions, ctime) is stored, so you will have to store it (authorship, permissions for web application users, additional meta-data) some near by by yourself. Also note that several users can commit data at the same time, so there will be branches in your versioning. When you will have huge dataset (and with binary office files it can come quicker than you think), you will not be able to partition such repository.

    IMO, using VCS here gives you very small gain and introduces additional problems.

    I'd advice keeping metadata in database (file name, revisions, additional stuff), and keep file revisions on disk. Keep each file with revisions in separate, unique dir. One tip here: don't use file names that comes from upload. Use hash functions to calculate unique name based on content and metadata.