apache-kafka google-cloud-platform google-cloud-datastore google-cloud-bigtable

Datastore/BigTable ACID and key update notifications

I've got Kafka topic which contains catalog data with following commands:

item_upsert
partial_item_update
delete_item
delete_all

Now I need to consume this topic, possible streaming 100k msgs/sec, to some DB, which will help me to translate original stream of commands to stream of item states. So there will be only current item state from DB. Basically DB will be used a lookup.

My idea was:

Insert/Update/Delete items in Datastore,
Once specific message is processed, I'll send new message to another stream telling downstream consumers that certain item was Inserted/Updated/Deleted. These consumers will afterwards read current state of item from Datastore and ingest item state to the another Kafka topic.

My worries are about ACID of Datastore. How "ACID" is it? Is it even suitable for such use-case?

I was also thinking about using cheaper BigTable, but that doesn't seems right choice for this use-case.

If you have any ideas/recommendations how else to solve this issue, I'll be glad.

Solution

Bigtable can handle the rate of 100K with a 10 node cluster (I have run tests up to 3,500 nodes, which handles 35M updates per second). Bigtable has strong consistency for a single row upserts. Bigtable users design schemas that fit all of their transactional data into a single row.

Cloud Bigtable supports upserts, and does not have a distinction between insert and update. There is also a delete by range that could theoretically be used for your delete_all case.

The high transaction rate and the lower cost are the right reasons to use Cloud Bigtable. Alternatively, you can consider using Cloud Spanner which is meant for high throughput transactional data.