Search code examples
google-app-enginegoogle-drive-apidocument-managementgoogle-apps-for-educationgoogle-app-engine-python

Google Drive / App Engine for Document Management System


I administer a University's document management system. The system is a 3rd party that integrates with another 3rd party database that acts as our ERP system. The DMS is quite clunky and has a wide array of terrible bugs / lacks features & support. I've been playing around with Google App Engine / Drive SDK in my free time out of curiosity. Since we are a Google Apps for Education customer, we have unlimited drive space and all our users are Google apps users.

Would it be feasible to internally build a web application (potentially powered by Google App Engine) that utilizes the Drive SDK to manage all the university's files (~ 6 TB). From my experimenting it seems to have all the capabilities required.


Solution

  • Since you'll be building your own software, the answer to "will it do what I want" is always "yes, eventually".

    You'll need to make a decision about document formats, which in turn will influence your indexing mechanism. Specifically, you have two primary options:-

    1. convert the files to Google document formats (doc, spreadsheet, etc). You will then be able to use Google's own indexing and search, eg. as you would from drive.gogle.com. The downside is that formatting may be lost during the import/export round trip.

    2. keep the documents in their native format (eg. MS .docx), and perform your own indexing. This will require parsing each document type, which is non-trivial, but I'm sure there are third party libraries to assist. The upside is that the documents you retrieve are the identical documents you imported.

    I think I would look at doing both of the above. Thus when you import a file into your DMS you store it twice into Google Drive, converted and unconverted. Use App Engine datastore to keep track of the pairings. This way you can use the Drive search to find the converted document, but the file you serve back to the user is its unconverted twin.