Search code examples
dockerkubernetessplash-screenscrapy-splash

Connection to Splash service on Kubernetes, GKE


I have a Python controller which uses scrapy-splash lib that sends SplashRequest to a Splash service.

Locally, I run both, the controller and the splash service in a two different Dockers.

yield SplashRequest(url=response.url, callback=parse, splash_url=<URL> endpoint='execute', args=<SPLASH_ARGS>)

When I send the request locally with splash_url="http://127.0.0.1:8050, everything works fine.

Now, I wanted to have a Kubernetes deployment with Splash and to process the splash request on the cloud. I have created Splash Deployment and a Service with type=LoadBalancer on Google Cloud Kubernetes.

And sending the splash request to the External Ip of the splash service.

But splash doesn't receive any request... and in the python script I get

twisted.python.failure.Failure twisted.internet.error.TCPTimedOutError: TCP connection timed out: 60: Operation timed out.

It worked in the past while using Internal endpoint of the pod, but I started to get Missing schema exception cause I didn't used http:// in the url.

  • splash docker image scrapinghub/splash:3.2
  • Kubernetes version 1.7, (tried also on 1.9)

splash-deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: my-app
  name: splash
  namespace: ns-app
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      labels:
        app: splash
    spec:
      containers:
      - image: scrapinghub/splash:3.2
        name: splash
        ports:
        - containerPort: 8050
        resources: {}
      restartPolicy: Always
status: {}

splash-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: app
  name: splash
  namespace: ns-app
spec:
  type: LoadBalancer
  ports:
  - name: "8050"
    port: 8050
    targetPort: 8050
    protocol: TCP
  selector:
    app: app
status:
  loadBalancer: {}

UPDATE I noticed that locally when I get into http://localhost:8050/ I see Splash UI, while entering to the via Kubernetes IP I get

refused to connect

How to solve it?? Thank you


Solution

  • The problem is that splash-service.yaml selector is wrong.. it should point to the Deployment name.

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: app
      name: splash
      namespace: ns-app
    spec:
      type: LoadBalancer
      ports:
      - name: "8050"
        port: 8050
        targetPort: 8050
        protocol: TCP
      selector:
        app: splash
    status:
      loadBalancer: {}