I've successfully used Zipkin with Hadoop Htrace in 2.6.0 x32, on Ubuntu 14.04. Now I want to use it with Hadoop 2.7.3., but I can't even enable Htrace tracing with this hadoop version. The setup for HTrace in 2.6.0 is different from 2.7.3, as it can be seen here-2.6.0 and here-2.7.3.
In 2.6.0 I'd have this line in the namenode log file :
INFO org.apache.hadoop.tracing.SpanReceiverHost: SpanReceiver org.htrace.impl.ZipkinSpanReceiver was loaded successfully.
I have nothing like that in 2.7.3 Namenode log file.
Because of not having success with Zipkin, I tried to use the LocalFileSpanReceiver as described in the online tutorial:
<property>
<name>hadoop.htrace.sampler</name>
<value>AlwaysSampler</value>
</property>
<property>
<name>hadoop.htrace.spanreceiver.classes</name>
<value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
</property>
<property>
<name>hadoop.htrace.local-file-span-receiver.path</name>
<value>/var/log/hadoop/htrace.out</value>
</property>
The /var/log/hadoop/ exists, with 777 rights on it, but nothing...
The TracingFsShell example compiles and runs with the following modification:
SpanReceiverHost.get(new HdfsConfiguration(),"");
As it can be found in the source code of hadoop in hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
although the online tutorial does not use that method signature. (Source diff)
The environment is the same for both Hadoop versions, java 1.7. Also, hadoop is compiled from source, as the Ubuntu 14.04 is x32 bit. Hadoop is deployed in fully-distributed mode, using lxc containers.
core-site.xml
for Zipkin ( Zipkin params here):
<property>
<name>hadoop.htrace.spanreceiver.classes</name>
<value>org.apache.htrace.impl.ZipkinSpanReceiver</value>
</property>
<property>
<name>hadoop.htrace.zipkin.scribe.hostname</name>
<value>10.0.3.100</value>
</property>
<property>
<name>hadoop.htrace.zipkin.scribe.port</name>
<value>9410</value>
</property>
Thanks for trying out HTrace! Sorry that the version issue is such a pain right now.
It is much easier to configure HTrace with the version in cloudera's CDH5.5 distribution of Hadoop and later. There is a good description of how to do it here: http://blog.cloudera.com/blog/2015/12/new-in-cloudera-labs-apache-htrace-incubating/ If you want to stick with an Apache release of the source code rather than a vendor release, try Hadoop 3.0.0-alpha1. http://hadoop.apache.org/releases.html
The HTrace libraries shippped in Hadoop 2.6 and 2.7 are very old... we never backported HTrace 4.x to those branches. They were stability branches, so new features like tracing was out of scope. There is some functionality there, but not much. I recommend using the newer HTrace 4.x library which is actively developed. The HTrace 4.x branch also has a stable API, so hopefully breakage will be minimized in the future.