I'm wanting to try out DynamoDB and use it for access.logs generated by nginx, which will later be used for a reporting dashboard, that'll include IP, referral url, referral domain, browser, etc.
The initial setup will be EC2 instances running nginx and CloudWatch that will consume the access.logs for the nginx instances.
The idea is that a CloudWatch entry will trigger a lambda function which will parse the log and put it into DynamoDB.
I'm not too familiar with DynamoDB other than what I've read, but here's how I was thinking of doing the schema for this:
ID will be the url hit by nginx, this is what we would be reporting on.
ReferralDomain (table)
ReferralURL (table)
ReferralBrowser (table)
And this would continue on for other items being reported on, such as IP or GEO info (ReferralCity, ReferralCountry, etc.).
Does this seem like a good schema design for this type of data within Dynamo? Ultimately, the dashboard will be for a specific ID with date range operations, that will display a list of totals (aggregates) by URL, Browser, etc. as well as actually listing out the data. Also, one of the reports may have unique items listed with counts. For example, for ReferralDomain "Facebook" may have a count of 550 within a date range for a specific ID. This may need to be done within EMR?
Is there a better schema to use or any other considerations that should be taken into account with Dynamo for this type of data? Thank you
The primary key looks solid, and your architecture will work and scale nicely.
if i understand nginx / your use case correctly - i'm not sure why you want to split your tables based on an attribute.
You can have one table:
Links (table)
And since DynamoDB is schemaless you can leave some of them.