AWS XRay is a tracing service that allows you to trace requests in distributed systems, and even profile your services. Without going too much in to how XRay works, it basically monitors your service and sends data about each request to the service via UDP to a daemon that collects this data and sends it to AWS.
This daemon, when running locally or in EC2, is local to the machine the service is running on and is available on port 2000. This is the default configuration for the location of the daemon host.
When running in Kubernetes, you need to set up a daemon to run on each node. As per the documentation for setting up XRay with Kubernetes, you can override the default value by setting an environment variable AWS_XRAY_DAEMON_ADDRESS
with the required host, or you can set a JVM system variable com.amazonaws.xray.emitters.daemonAddress
. There is also a reference to this in the SDK documentation.
Due to my use case, and how we share configurations in my organisation, I would like to utilise the method of setting the environment variable.
As per the documentation, we set it on deployment via our helm charts:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: aws-xray-daemon.default
By exec'ing in to the pod the service is running on, and running printenv
we can see that this value is successfully set upon deployment.
The Issue:
When XRay tries to profile and send data to the daemon, an SdkClientException
is thrown:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:2000 [/127.0.0.1] failed: Connection refused (Connection refused)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1201) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1147) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.doInvoke(AWSXRayClient.java:1607) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1574) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1563) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.executeGetSamplingRules(AWSXRayClient.java:800) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.getSamplingRules(AWSXRayClient.java:771) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.pollRule(RulePoller.java:65) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.lambda$start$0(RulePoller.java:46) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
...
This means that the AWS SDK is not picking up this environment variable as the documentation suggest, and just uses the default value of 127.0.0.1:2000
.
I then went for a dig in to the SDK code as to find how it goes about retrieving this variable and found that the code that runs it uses System.getenv("AWS_XRAY_DAEMON_ADDRESS")
as shown below:
/**
* Environment variable key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any system property,
* constructor value, or setter value used.
*/
public static final String DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY = "AWS_XRAY_DAEMON_ADDRESS";
/**
* System property key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any constructor or setter value
* used.
*/
public static final String DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY = "com.amazonaws.xray.emitters.daemonAddress";
public DaemonConfiguration() {
String environmentAddress = System.getenv(DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY);
String systemAddress = System.getProperty(DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY);
if (setUDPAndTCPAddress(environmentAddress)) {
logger.info(String.format("Environment variable %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY, getUDPAddress()));
} else if (setUDPAndTCPAddress(systemAddress)) {
logger.info(String.format("System property %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY, getUDPAddress()));
}
}
So I thought, maybe I haven't set the environment variable properly? So I added a log of the retrieval of the environment variable upon start up of the service, and found that the JVM can indeed find the value:
Code:
System.out.println("System.getenv(\"AWS_XRAY_DAEMON_ADDRESS\")" + " = " + System.getenv("AWS_XRAY_DAEMON_ADDRESS"))
Output:
System.getenv("AWS_XRAY_DAEMON_ADDRESS") = aws-xray-daemon.default
As far as I can tell, this code matches exactly what should be run by the AWS SDK, and yet it never seems to be executed, and if it is, it doesn't have the same outcome as what i've tested with my logs.
Running locally, I am unable to replicate this issue, as it picks up the host i've given from my local environment variables. I have also confirmed that the AWS SDK code pasted above is reached when running locally by using breakpoints.
Any ideas?
Gradle Snippet:
ext {
...
springCloudVersion = "Greenwich.RELEASE"
awsCoreVersion = '1.11.739'
awsXrayVersion = '2.4.0'
...
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
mavenBom "com.amazonaws:aws-java-sdk-bom:${awsCoreVersion}"
mavenBom "com.amazonaws:aws-xray-recorder-sdk-bom:${awsXrayVersion}"
}
}
dependencies {
...
implementation "com.amazonaws:aws-java-sdk-core"
implementation "com.amazonaws:aws-xray-recorder-sdk-core"
implementation "com.amazonaws:aws-xray-recorder-sdk-aws-sdk"
implementation "com.amazonaws:aws-xray-recorder-sdk-spring"
implementation "com.amazonaws:aws-xray-recorder-sdk-apache-http"
implementation "com.amazonaws:aws-xray-recorder-sdk-sql-postgres"
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.springframework.boot:spring-boot-starter'
implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
implementation 'org.springframework.boot:spring-boot-starter-security'
...
}
Other info:
Other attempts:
- I have tried setting the environment variable via the Dockerfile
. This had the same outcome.
Turns out that the blog post I linked was not a good blog post. In the example they don't specify the port with the host:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: xray-service.default
Changing the environment variable to include the port fixed the issue:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: xray-service.default:2000