Search code examples
pythonkubernetescassandraamazon-eksscylla

Access Scylla on EKS with Python Driver


I am a newbie to Kubernetes. Recently, I am asked to set up Scylla on AWS. I followed the tutorial to deploy Scylla on EKS (http://operator.docs.scylladb.com/master/eks.html). Everything went well.

Then, I followed Accessing the Database section in another related tutorial (http://operator.docs.scylladb.com/master/generic.html).

I was able to run the commands for the first two steps:

kubectl exec -n scylla -it scylla-cluster-us-east-1-us-east-1a-0 -- cqlsh
> DESCRIBE KEYSPACES;
kubectl -n scylla describe service scylla-cluster-client

However, I don't know how to perform the last step, which said:

Pods running inside the Kubernetes cluster can use this Service to connect to Scylla. Here’s an example using the Python Driver:

from cassandra.cluster import Cluster
cluster = Cluster(['scylla-cluster-client.scylla.svc'])
session = cluster.connect()

The script fails to resolve scylla-cluster-client.scylla.svc. Therefore, I also tried different IPs, but cassandra.cluster.NoHostAvailable error is encountered.

In addition, I found that pip is not installed after connecting to the cluster via

kubectl exec -n scylla -it scylla-cluster-us-east-1-us-east-1a-0 -- /bin/bash

Can anyone help me solve the connection issue using Python driver?

It would be great if you can tell me:

  1. Why scylla-cluster-client.scylla.svc does not work for me?
  2. What is the different between kubectl exec -n ... and Cassandra drivers?
  3. Which IPs should I use? I noticed that there are cluster IPs from Kubernetes, internal IPs from Kubernetes, and public IPs of the EC2 machines from AWS. If public IP is needed, do I need to open the ports (e.g. 9042) on AWS? How to make it more secure?

Thanks in advance.


Solution

    1. scylla-cluster-client.scylla.svc is a k8s resolvable DNS address, so only works from pods hosted on the same cluster (and namespace). You can't use it from the outside
    2. kubectl exec runs a command on one of the Scylla pods, so essentially you are running the command on the Scylla node itself and connecting to localhost on that node. In contrast, scylla-cluster-client.scylla.svc is connecting remotely (but within the k8s network)
    3. You don't need to use an IP - use the scylla-cluster-client.scylla.svc DNS name. If you want to use IP addresses you can manually resolve the DNS name or read the IP addresses of the Scylla pods using the k8s API - but there's really no reason to do that. If you want to connect from outside the cluster you would need a public service or something like that - basically a k8s managed proxy. In theory you could allow the public pods but that's highly inadvisable.