Cassandra write throttling with multiple clients

I have two clients (separate docker containers) both writing to a Cassandra cluster.

The first is writing real-time data, which is ingested at a rate that the cluster can handle, albeit with little spare capacity. This is regarded as high-priority data and we don't want to drop any. The ingestion rate varies quite a lot from minute to minute. Sometimes data backs up in the queue from which the client reads and at other times the client has cleared the queue and is (briefly) waiting for more data.

The second is a bulk data dump from an online store. We want to write it to Cassandra as fast as possible at a rate that soaks up whatever spare capacity there is after the real-time data is written, but without causing the cluster to start issuing timeouts.

Using the DataStax Python driver and keeping the two clients separate (i.e. they shouldn't have to know about or interact with each other), how can I throttle writes from the second client such that it maximises write throughput subject to the constraint of not impacting the write throughput of the first client?

Solution

The solution I came up with was to make both data producers write to the same queue.

To meet the requirement that the low-priority bulk data doesn't interfere with the high-priority live data, I made the producer of the low-priority data check the queue length and then add a record to the queue only if the queue length is below a suitable threshold (in my case 5 messages).

The result is that no live data message can have more than 5 bulk data messages in front of it in the queue. If messages start backing up on the queue then the bulk data producer stops queuing more data until the queue length falls below the threshold.

I also split the bulk data into many small messages so that they are relatively quick to process by the consumer.

There are three disadvantages of this approach:

There is no visibility of how many queued messages are low priority and how many are high priority. However we know that there can't be more than 5 low priority messages.
The producer of low-priority messages has to poll the queue to get the current length, which generates a small extra load on the queue server.
The threshold isn't applied strictly because there is a race between the two producers from checking the queue length to queuing a message. It's not serious because the low-priority producer queues only a single message when it loses the race and next time it will know the queue is too long and wait.