Search code examples
microservicestracezipkin

Should I modify zipkin service libraries to pass generic feature flags?


We're looking to implement Zipkin in our stack. As I look into Zipkin it makes sense to me to extend the Zipkin system to handle generic flags as well.

Observations:

  1. Any implementation of Zipkin needs to capture "B3" tagged values (headers in HTTP) and propagate them to requests further down the stack.
  2. Some values are mutated
  3. Some values are just propagated (Sampled, Debug)

Conclusion:

  • Extending Zipkin with the option to propagate (X-)B3-Flag- Key/Value pairs make sense.
  • This enables A/B testing and Blue/Green Deploy.
  • These techniques often need to compare timings to ensure that timings are similar or improved unless noted by the service development team.

Solution

  • The TL;DR; is that B3 propagation was initially designed for fixed size data: carrying data ancillary to tracing isn't in scope, and for this reason any solution that extends B3 in such a fashion wouldn't be compatible with existing code.

    So, that means any solution like this will be an extension which means custom handling in the instrumented apps which are the things passing headers around. The server won't care as it never sees these headers anyway.

    Ways people usually integrate other things like flags with zipkin is to add a tag aka binary annotation including its value (usually in the root span). This would allow you to query or retrieve these offline, but it doesn't address in-flight lookups from applications.

    Let's say that instead of using an intermediary like linkerd, or a platform-specific propagated context, we want to dispatch the responsibility to the tracing layer. Firstly, what sort of data could work alright? The easiest is something set-once (like zipkin's trace id). Anything set and propagated without mutating it is the least mechanics. Next in difficulty is appending new entries mid-stream, and most difficult is mutating/merging entries.

    Let's assume this is for inbound flags which never change through the request/trace tree. We see a header when processing trace data, we store it and forward it downstream. If this value doesn't need to be read by the tracing system, it is easiest, as it is largely a transport/propagation concern. For example, maybe other middleware read that header and it is only a "side job" we are adding to the tracer to remember certain things to pass along. If this was done in a single header, it would be less code than a pattern in each of the places this would be to added. It would be even less code if the flags could be encoded in a number, however unrealistic that may be.

    There are libraries with apis to manipulate the propagated context manually, for example, "baggage" from brownsys and OpenTracing (of which some libraries support zipkin). The former aims to be a generic layer for any instrumentation (ex monitoring, chargeback, tracing etc) and the latter is specific to tracing. OpenTracing has defines abstract types like injector and extractor which could be customized to carry other fields. However, you still would need a concrete implementation (which knows your header format etc) in order to do this. Unless you want applications to read this data, it would need to be a secret detail of that implementation (specifically the trace context).

    Certain zipkin-specific libraries like Spring Cloud Sleuth and Brave have means to customize how headers are parsed, to support variants of B3 or new or site-specific trace formats. Not all support this at the moment, but I would expect this type of feature to become more common. This means you may need to do some surgery in order to support all platforms you may need to support.

    So long story short is that there are some libraries which are pluggable with regards to propagation, and those will be easiest to modify to support this use case. Some code will be needed regardless as B3 doesn't currently define an expression like this.