Search code examples
javadebugginglog4jdistributed-computing

How to debug large server side distributed Java application


Here is my problem: I am trying to debug Apache Cassandra and understand the flow of the app. I.e. when a request is sent by the client, say put(), what methods are called and how the system is working internally.

So, here is what I am thinking:

  1. Write a main method in the cassandra code which calls the point of entry put() method, put breakpoints in eclipse etc etc OR
  2. Don't write a main method, simply use regular client (which accesses server via TCP) and "debug" (by reading the log files and understanding the code) using log4j loggers (already implemented in cassandra).

So, my question is, what is the ideal way of debugging such a distributed application?


Solution

  • Ideal way? Both, and more.

    You mentioned objectives: "debug" and "understand the flow of the application" - OK it's very hard to debug before you do understand the flow, but understanding may be an end in itself.

    In the real world, when dealing with large distributed systems on often cannot rely on debuggers, at least initially, not least because some problems only show up when the system is busy or after hours of running. Hence good debug trace, and fine-grained control over that trace, in the application code and infrastructure code is essential.

    However if you have the opportunity to run in a debugger that can be quite illuminating.

    Before all of that I think you need to:

    a). Study any design documentation that there may be.

    b). Browse the source code in a good IDE, eg. Eclipse. Just follow the control. Hmmm here's an interesting bit, wonder where it gets called from? Call to that method on a class, what does that do? When does that constructor get called?

    With some of that in your head followng the trace is much easier, and you have a better idea where to put the breakpoints.