Search code examples
traefikjaeger

Trying to set up distributed event tracing


I am trying to set up distributed event tracing throughout out microservice architecture.

Here is some preamble about our architecture:

  1. Traefik load balancer that forwards request to the appropriate backend service based on the route pathname.

  2. Frontend application on a "catchall" route that is served whenever a route is not caught by another microservice.

  3. Various backend services in node/dotnetcore listening on /api/<serviceName>

traefik is setup with the traceContextHeaderName set to "trace-id".

How I imagine this would work is that the frontend application receives a header "trace-id" from the load balancer with a value that can be used to "link" the spans together for requests that are related.

Example scenario:

When a customer loads attempts to sign in, they make a request for the web application, receive and render the HTML/CSS/JS, then the subsequent requests to /api/auth/login can be POSTed with the login data and the value of the "trace-id" header supplied by traefik. The backend service that handles the /api/auth/login endpoint can capture this "trace-id" header value and publish some spans to jaeger related to the work that it is doing to validate the user.

What is happening:

When the request is made for the frontend HTML, no "trace-id" header is received so any subsequent spans that are published are all considered individual traces and are not linked together.

traefik.toml:

...
[tracing]
 backend = "jaeger"
 serviceName = "traefik"
 spanNameLimit = 0
 [tracing.jaeger]
   samplingServerURL = "http://jaeger:5778/sampling"
   samplingType = "const"
   samplingParam = 1.0
   localAgentHostPort = "jaeger:6831"
   traceContextHeaderName = "trace-id"
...

I understand that StackOverflow is not a "code it for me" service. I am looking for guidance on what could possibly be going wrong as I am new to distributed event tracing. I have tried googling and searching for answers but I have come to a dead end.

Any help/suggestions on where to look would be greatly appreciated.

Please let me know if I am barking up the wrong tree, approaching this incorrectly, or if my understanding of how the traceContextHeaderName should work is incorrect.


Solution

  • Ugh. I am an idiot.

    Here is what was going wrong for anyone else who might be stuck something like this:

    The frontend application is receiving a header, I was just looking in the wrong place.

    The request comes from the load balancer to the node frontend microservice which sends its response to the browser.

    I was checking the browser for the header, but the node frontend microservice was not forwarding this header to the browser.