So I have a Spark cluster running in Docker using Docker Compose. I'm using docker-spark images.
Then i add 2 more containers, 1 is behave as server (plain python) and 1 as client (spark streaming app). They both run on the same network.
For server (plain python) i have something like
import socket
s.bind(('', 9009))
s.listen(1)
print("Waiting for TCP connection...")
while True:
# Do and send stuff
And for my client (spark app) i have something like
conf = SparkConf()
conf.setAppName("MyApp")
sc = SparkContext(conf=conf)
sc.setLogLevel("ERROR")
ssc = StreamingContext(sc, 2)
ssc.checkpoint("my_checkpoint")
# read data from port 9009
dataStream = ssc.socketTextStream(PORT, 9009)
# What's PORT's value?
So what is PORT's value? is it the IP Adress value from docker inspect of the container?
Okay so i found that i can use the IP of the container, as long as all my containers are on the same network. So i check the IP by running
docker inspect <container_id>
and check the IP, and use that as host for my socket
Edit: I know it's kinda late, but i just found out that i can actually use the container's name as long as they're in the same network
More edit:
i made changes in docker-compose like:
container-1:
image: image-1
container_name: container-1
networks:
- network-1
container-2:
image: image-2
container_name: container-2
ports:
- "8000:8000"
networks:
- network-1
and then in my script (container 2):
conf = SparkConf()
conf.setAppName("MyApp")
sc = SparkContext(conf=conf)
sc.setLogLevel("ERROR")
ssc = StreamingContext(sc, 2)
ssc.checkpoint("my_checkpoint")
# read data from port 9009
dataStream = ssc.socketTextStream("container-1", 9009) #Put container's name here
I also expose the socket port in Dockerfile, I don't know if that have effect or not