Search code examples
apache-sparkcassandraspark-cassandra-connector

Is there a way to get access to the Cassandra schema info using the spark-cassandra connector?


The newer spark-cassandra connector has deprecated/removed the CassandraSQLContext which allowed one to execute CQL. And, now, I cannot find a way to find catalog info like : list of keyspaces, tables within a keyspace or column meta data.

Specifically, I want to be able to run something like select keyspace_name, table_name, column_name, type from system_schema.columns where keyspace_name = 'test' Maybe I missed API to run CQL? ( I am using the 2.0 connector)


Solution

  • Spark Cassandra connector has withSessionDo method that you can use, the same way as you do in Java driver, like this (adopted from documentation):

    import com.datastax.spark.connector.cql.CassandraConnector
    
    CassandraConnector(conf).withSessionDo { session =>
      session.execute("select keyspace_name, table_name, column_name, 
          type from system_schema.columns where keyspace_name = 'test';")
    }
    

    But you can use much simpler RDD operations, like this:

    sc.cassandraTable("system_schema", "columns").select("keyspace_name","table_name", 
         ...other columns...)
    

    P.S. Also, please notice that access via Metadata class that is possible to obtain via Session->Cluster is more portable way to do it.