We are feeding events (logs) from Logstash to Apache Cassandra using the PerimeterX Cassandra Logstash out plugin. We have hit the max throughput of the plugin to be 8K as it opens only 2 connections to Cassandra whereas Cassandra has a much higher throughput (for consuming data) and we expecting a throughput on the actual system to be 30K or higher.
Here throughput is the capacity to consume the incoming events, which is x units/sec
Hence we planned to introduced Kafa in the middle which has a 45K throughput with Logstash output.
We are looking for help from this stack overflow post. We could configure the connector JAR as mentioned in the documentation. But there is no proper guide or current documentation is very confusing and goes in a loop with the configuration requirement. We don't see the plugin being called when Kafka is running with the target topic.
Some help on what is the correct configuration, or some documentation info on Cassandra keyspaces will be helpful.
After placing the JAR as mentioned in the documentation We need to run Kafka connect which will show all the connectors configured. To turn on Kafka connect run the below command (Kafka connect in distributed mode)
bin/connect-distributed.sh config/connect-distributed.properties
Kafka connect has a REST API service available at http://localhost:8083 using this REST API you can configure your connectors.
To register the connector use the below API
POST /connectors – creates a new connector; the request body should be a JSON object containing a string name field and an object config field with the connector configuration parameters
The JSON sample to register the connector is present kafka-connect-cassandra-sink-1.4.0.tar.gz file.
The official-documentation provides a list with all endpoints.
More info available here