Search code examples
dockerapache-sparkspark-streaming

spark app socket communication between container on docker spark cluster


So I have a Spark cluster running in Docker using Docker Compose. I'm using docker-spark images.

Then i add 2 more containers, 1 is behave as server (plain python) and 1 as client (spark streaming app). They both run on the same network.

For server (plain python) i have something like

import socket
s.bind(('', 9009))
    s.listen(1)
    print("Waiting for TCP connection...")
    while True:
        # Do and send stuff

And for my client (spark app) i have something like

conf = SparkConf()
conf.setAppName("MyApp")

sc = SparkContext(conf=conf)
sc.setLogLevel("ERROR")
ssc = StreamingContext(sc, 2)
ssc.checkpoint("my_checkpoint")
# read data from port 9009
dataStream = ssc.socketTextStream(PORT, 9009)
# What's PORT's value?

So what is PORT's value? is it the IP Adress value from docker inspect of the container?


Solution

  • Okay so i found that i can use the IP of the container, as long as all my containers are on the same network. So i check the IP by running

    docker inspect <container_id>
    

    and check the IP, and use that as host for my socket

    Edit: I know it's kinda late, but i just found out that i can actually use the container's name as long as they're in the same network

    More edit:

    i made changes in docker-compose like:

    container-1:
        image: image-1
        container_name: container-1
        networks:
          - network-1
    container-2:
        image: image-2
        container_name: container-2
        ports:
          - "8000:8000"
        networks:
          - network-1
    

    and then in my script (container 2):

    conf = SparkConf()
    conf.setAppName("MyApp")
    
    sc = SparkContext(conf=conf)
    sc.setLogLevel("ERROR")
    ssc = StreamingContext(sc, 2)
    ssc.checkpoint("my_checkpoint")
    # read data from port 9009
    
    dataStream = ssc.socketTextStream("container-1", 9009) #Put container's name here
    

    I also expose the socket port in Dockerfile, I don't know if that have effect or not