I am currently using byte buddy to add some simple logic that counts the number of total method invocations on a per thread basis.
For advice instrumentation, I have something along the lines of:
@Advice.OnMethodEnter
static void handle() {
MethodCounter.increment();
}
In MethodCounter#increment
, I have a very simple ThreadLocal
counter, and the counter itself just increments an integer:
public class MethodCounter {
final ThreadLocal<Counter> threadCounter = new ThreadLocal<Counter>();
public static void increment() {
threadCounter.get().increment();
}
... and some other logic that ensures that the Counter is initialized for the current thread ...
}
When using JMH to benchmark this new logic, I am noticing about a 30% degradation in performance in a sample workflow (containing tight loops). Most of this seems to be due to the ThreadLocal
-- if I instead get rid of the ThreadLocal.get()
and hardcoding increment()
to increment a static Counter
, there is minimal performance impact.
Is there a more performant way to accomplish this with byte buddy while maintaining per-thread isolation?
Once a class is instrumented, Byte Buddy really is out of the picture.
To avoid the expensive thread local, you can try using weak lock free which uses a concurrent map on a thread basis and can perform better. In general, thread-locals are however somewhat expensive, especially if you look them up a lot and there is no good way around it.