In particular, I'd like to push all of the INSERT, UPDATE, and DELETE statements from my Postgres logs to a AWS Hadoop cluster and have a nice way to search them to see the history of a row or rows.
I'm not a Hadoop expert in any way, so let me know if this is a red herring.
Thanks!
Use flume to send logs from your RDS instance to Hadoop cluster. Using flume you could use regex interceptor to filter events and send just INSERT, UPDATE and DELETE statements. Hadoop does not make your data searchable so you have to use something like Solr.
You could either get the data to Hadoop first and then run bunch of MapReduce jobs to insert data into Solr. Or you could directly configure flume to write data to Solr, see link below.
Links:
EDIT:
It seems like RDS instances don't have SSH access, which means that you cannot natively run flume on the RDS instance itself but you have to periodically get the logs of the RDS instance manually to a machine (this could be a EC2 instance) which has flume configured.