Search code examples
pythonapache-kafkaapache-nifi

Queue filling up while listening to apache nifi from kafka and running python script


I'm listening to kafka topic using consumekafka in apache nifi. I am running a python script using executestreamcommand with the data coming from here. The kafka topic I'm listening to for pro-process happens in an infinite loop in the for loop. That's why the data passing through the queue never ends. queue is filling up. How do I solve this?

Nifi interface

1

Python loop
2

I thought of a solution. I can put a break at the end of the loop. But too much data is coming. constantly open and close the program slows down?

EDIT : Typing "break" doesn't work, I guess because the data comes too fast, turn it off and on and on it stays very slow.


Solution

  • You shouldn't put a loop in the Python code. Ensure your flow file passes one record to the Python function. Have it return one event. Then forward that to ProduceKafka Nifi processor rather than needing to install any kafka-python dependency

    Also, I've not benchmarked it, but I suspect using a Groovy processor, or your own custom Java one, would be faster than starting a Python process