Search code examples
mongodbamazon-web-servicesiotamazon-kinesis

AWS: How to save Streaming data to database hosted on EC2 ( ex. MySQL/ MongoDB )


We can easily save data between different AWS Services for ex. Kinesis to DynamoDB; or AWS IoT to Redshift etc.

But what is best strategy to save streaming data to suppose MongoDB ( which does NOT have AWS PaaS ; Atlas is there but it has no integrations with other AWS Services )

I can see some third party solutions are there; but what is best strategy to implement on AWS itself...Is execution of lambda function for each insert (batching) the only option ?


Solution

  • The solution depends mostly on your use case. How fast do you need to insert the data into your MongoDB?

    if you need a near real time solution, then Kinesis and Lambdas is you best option (assuming you don't want to invest in 3rd party products). If you can afford a delay and do batching, then you can save the kinesis stream into S3 and then use AWS Glue to process and load your data into the database.

    What you need to think is mostly what do you need to do with the data.

    If you are collecting sensor data, where you only care about aggregations (e.g. clicks in a UI), then it is better if you store the raw data into s3 and then execute a data pipeline (using AWS Glue for example) to store the aggregated data into MongoDB. S3 will be faster and cheaper for those types of data.

    If you are using the stream to pass business entities (e.g. documents that provide value on their own), then a near real time solution using AWS lambda will be a better choice.

    Without knowing the exact use case, I would propose to store in your database only the data that provide value (e.g. reports on aggregated data) and use S3 with a lifecycle policy for the raw "sensor" data.