This question may look like this. I am trying to gather ideas on how to implement a BGP pipeline.
I am receiving 100-1000 messages (BGP updates) per second, a few kilobytes per update, over Kafka.
I need to archive them in a binary format with some metadata for fast lookup: I am building periodically a "state" of the BGP table which will merge all the updates received over a certain time. Thus the need of a database.
What I was doing until now: group them in "5 minute" files (messages end-to-end) as it is common thing for BGP collection tools and add the link in a database. I realize some disadvantages: complicated (having to group by key, manage Kafka offset commit), no fine selection where to start/end.
What I am thinking: using a database (Clickhouse/Google BigTable/Amazon Redshift) and insert every single entry with the metadata and a link to the unique update stored on S3/Google Cloud storage/local file.
I am worried of the download performances (most likely over HTTP) since compiling all the updates into a state may take a few thousands of those messages. Do you have experience of batch downloading this? I do not think storing the updates directly in the database would be optimal too.
Any opinion, ideas, suggestions? Thank you
Cloud Bigtable is capable of 10,000 requests per second per "node", and costs $0.65 per node per hour. The smallest production cluster is 3 nodes for a total of 30,000 rows per second. Your application calls for a maximum of 1,000 requests per second. While Cloud Bigtable can handle your workload, I would suggest that you consider Firestore.
At a couple of K per message, I would also consider putting the entire value in the database rather than just the metadata for ease of use.