Search code examples
loggingamazon-web-serviceserror-logging

Using AWS for application logging database


I'm currently working on a large web application that produces a large amount of log data. Because we don't have the infrastructure to log all the events to a database, we are writing them to file. Unfortunately, this makes it very difficult to search the logs for a specific event and impossible to generate reports on frequency.

While trying to figure out how to implement better database logging, I found Amazon's services. Specifically SimpleDB and DynamoDB. One of the use cases for SimpleDB was logging, but then later it states that

Amazon SimpleDB is designed to store relatively small amounts of data...

This seems contradictory. Here are my questions:

  1. Would these database services be suitable for logging application events?
  2. Would the be suitable for generating reports from log data?
  3. Would I use a Timestamp as my primary key?
  4. Are there drawbacks to this sort of service or something else I should consider?

Update 2018-06-13: I have since used SimpleDB to log application data on large applications. The key was to partition the logs into domains corresponding to the time period they were generated (daily for example) to ensure they didn't grow beyond their limit. Then set up a CRON job to periodically delete the old domains. This solution has worked well and is easily searchable.


Solution

  • My answer is based off of my experience with SimpleDB in a production environment.

    1. I log thousands of application events a day to SimpleDB.
    2. We have researchers that regularly pull the log data from SimpleDB and they've never complained. I'm not sure about running report queries on the SimpleDB itself, but I don't see why this would be a problem, when I run queries they are always fast for me.
    3. SimpleDB indexes all columns, there's no "primary key".
    4. I personally don't enjoy using the third-party tools that have been built to interact with SimpleDB. Also, consider the price based on how much data and processing you'll be utilizing.