python google-cloud-platform google-cloud-spanner google-cloud-python

how to query spanner and get metadata, especially columns' names?

I'm trying to query custom SQL on Spanner and convert the results into a Pandas Dataframe, so I need data and column names, but I can't find a way to get the column names.

According to the documentation, I can get columns using metadata or fields properties, but this doesn't work.

I tried to run a query transaction and also to get a snapshot, but I just get a data row.


from google.cloud import spanner
from google.cloud.spanner_v1.streamed import StreamedResultSet

def query_transaction(instance_id, database_id, query_param):

    spanner_client = spanner.Client.from_service_account_json("PATH_XXXXX")
    database = spanner_client.instance(instance_id).database(database_id)

    def run_transaction(transaction):
        query = query_param
        results: StreamedResultSet = transaction.execute_sql(query)
        print("type", type(results))
        print("metadata", results.stats)
        for row in results:
            print(row)

    database.run_in_transaction(run_transaction)

def query_snapshot(instance_id, database_id, query):

    spanner_client = spanner.Client.from_service_account_json("PATH_XXXXX")
    database = spanner_client.instance(instance_id).database(database_id)

    with database.snapshot() as snapshot:
        results: StreamedResultSet = snapshot.execute_sql(query)

        print("metadata", results.metadata)
        print("type", type(results))

        for row in results:
            print(row)


spanner_id = "XXXXXXX"
base_id = "XXXXXXXX"
query ="SELECT * FROM XXXXX LIMIT 5"

spanner.query_snapshot(spanner_id, base_id, query)
spanner.query_transaction(spanner_id, base_id, query)

I can iterate the results and get rows, but metadata always is None.

Solution

You must fetch at least one row before the metadata are available. So if you were to change the order of your code so that you first fetch the data (or at least some data), and then get the metadata, it should work.

results: StreamedResultSet = snapshot.execute_sql(query)
print("metadata", results.metadata)
for row in results:
    print(row)

Into this:

results: StreamedResultSet = snapshot.execute_sql(query)
for row in results:
    print(row)
print("metadata", results.metadata)

then you should be getting metadata.

Also note that result set statistics (results.stats) is only available when you are profiling a query. When you are just executing the query, as you are in your above example, this will always be empty.