Search code examples
python-3.xcassandrajaegercassandra-driver

Select and decode blob using python cassandra driver


I am trying to query the traces Cassandra table which is part of the Jaeger architecture. As you can see the refs field is a list:

cqlsh:jaeger_v1_dc1> describe traces

CREATE TABLE jaeger_v1_dc1.traces (
    trace_id blob,
    span_id bigint,
    span_hash bigint,
    duration bigint,
    flags int,
    logs list<frozen<log>>,
    operation_name text,
    parent_id bigint,
    process frozen<process>,
    refs list<frozen<span_ref>>,
    start_time bigint,
    tags list<frozen<keyvalue>>,
    PRIMARY KEY (trace_id, span_id, span_hash)
) 

from the python code:

traces = session.execute('SELECT span_id,refs from traces')

for t in traces:
    if t.refs is not None:
        parentTrace=t['refs'][0].trace_id
  1. My first question is it possible to directly select the parent trace without iterating through the result? Is there a way i can get the first element in the list and then get the elements inside from the select statment?
  2. From the terminal using cqlsh ,I am getting this result: trace_id: 0x00000000000000003917678c73006f57. However, from a python cassandra client I got this trace_id=b'\x00\x00\x00\x00\x00\x00\x00\x009\x17g\x8cs\x00oW' any idea what transformation happened to it? How can decode it since I want to use to query the table again.

Solution

    1. To my knowledge, there is no easy way as there is no guarantee that the spans are stored in a specific order. Worth noting though, is if by parentTrace, you mean the root span of the trace (the first span), then you can search for spans where refs is null because a root span has no parent. Another way to identify a root span is if the trace_id == span_id.
    2. trace_id is stored as a binary blob. What you see from cassandra client is an array of 16 bytes with each octet element represented as two hexadecimal values. To convert it to the hex string you see in cqlsh, you'll need to convert the entire array to a single hex string. See the following python example that does this:
    from cassandra.cluster import Cluster
    
    cluster = Cluster(['127.0.0.1'])
    session = cluster.connect()
    rows = session.execute("select * from jaeger_v1_test.traces")
    trace = rows[0]
    hexstr = ''.join('{:02x}'.format(x) for x in trace.trace_id)
    print("hex=%s, byte_arr=%s, len(byte_arr)=%d" % (hexstr, trace.trace_id, len(trace.trace_id)))
    cluster.shutdown()