I have couple of services deployed on Kubernetes. Some of NodeJS based, others are Java based. In the cluster there's OTEL Collector deployed which then provides data for Prometheus. Grafana is used for dashboarding. For Java I'm using -javaagent:/jars/opentelemetry-javaagent.jar
and for NodeJS simple tracing file such as:
const sdk = new NodeSDK({
// Service name is configured by OTEL_SERVICE_NAME
traceExporter: new OTLPTraceExporter(),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter(),
exportIntervalMillis: 5000,
}),
instrumentations: [getNodeAutoInstrumentations()], // will contain https://www.npmjs.com/package/@opentelemetry/instrumentation-http
});
Rest of OTEL config is defined in ENVs (traces configuration is omitted for readability):
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_METRICS_EXPORTER=otlp
OTEL_SERVICE_NAME=[service name]
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector-listens-here:4317
Apps are deployed on Kubernetes, with 2 or more pods each. And I think this is the problem why I'm getting strange results for http_server_duration_milliseconds_count
metric. See examples:
Available labels for those metrics are:
http_flavor
http_method
http_route
http_scheme
http_status_code
job
net_host_name
net_host_port
net_protocol_name
net_protocol_version
Is my assumption correct, that there's no way to differentiate pods and those metrics are treated as coming from one source? I'm thinking like ServiceA#pod1 exports value 1, then ServiceA#pod2 (which got more requests) exports 12 and after that ServiceA@pod1 exports 3 (as it got 2 new requests) and so on?
If so, what's the best solution to solve this?
net_host_ip
which I would expect to be set to pod IP, but this attribute isn't set automatically in Java and NodeJS based instrumentation.k8s_pod_name
or something to
differentiate the pods?service.instance.id
seems like "native" solution to my problem, but it's experimental stateAny suggestions or clarifications will be much appreciated :)
This is the intended use case for service.instance.id
. Experimental in the OpenTelemetry specification unfortunately doesn't indicate how experimental, or stable, something is.
Signals start as experimental, which covers alpha, beta, and release candidate versions of the signal.
service.instance.id
is likely safe to rely on due to the how important it is for use cases like you shared (identifying different k8s pods, for example). The definition of how to best generate this ID could change, however, but it's intended to be an opaque value used to compare the behavior of instances.