Search code examples
dockerapache-kafkaredpanda

Why does the consumer redpanda/kafka already have data?


I have 2 Redpandas (Redpanda is just like Kafka). The datagen feeds into the producer Redpanda. If I query the producer RP I can see the data, but if I query the consumer RP at port 9093 it also already has data. Why? It seems like they act as a cluster. I just want them to act as independent entities.

# cleanup
rm -rf /tmp/redpanda
mkdir /tmp/redpanda
rm -rf /tmp/redpanda2
mkdir /tmp/redpanda2

# start producer redpanda
docker run --rm -v /tmp/redpanda:/var/lib/redpanda/data -p 8081:8081 -p 8082:8082 -p 9092:9092 -p 9644:9644  docker.vectorized.io/vectorized/redpanda:latest redpanda start --smp 1 --reserve-memory 0M --memory 4G --overprovisioned --node-id 0 --check=false

# start consumer redpanda
docker run --rm -v /tmp/redpanda2:/var/lib/redpanda/data -p 8083:8081 -p 8084:8082 -p 9093:9092 -p 9645:9644 docker.vectorized.io/vectorized/redpanda:latest redpanda start --smp 1 --reserve-memory 0M --memory 4G --overprovisioned --node-id 1 --check=false

# datagen metrics. Send to producer redpanda
docker run  --network host some-data-gen:latest /bin/sh -c "/datagen --brokers host.docker.internal:9092"

# Events coming in at producer redpanda 
rpk --brokers localhost:9092 topic consume tcp_metrics --offset end

# already get events here 
rpk --brokers localhost:9093 topic consume tcp_metrics --offset end

EDIT

If I do not run rpk --brokers localhost:9093 topic consume tcp_metrics --offset end, and I check the volumes of the second redpanda, I do not see any data. Does rpk topic consume check if the topic is there or make it consume it? I thought this is just like SELECT * FROM topic;


Solution

  • Can you try the same after starting two containers in different Docker networks?

    If you run rpk --brokers localhost:9093 cluster metadata -b -c -t and rpk --brokers localhost:9092 cluster metadata -b -c -t, you should get different results if they are two separate clusters.