Search code examples
javaintellij-ideaschedulingstream-processingheron

debugging apache heron sccheduler


Twitter claims that one of the greatest advantages of apache heron compared to apache storm is debug-ability and that is achieved by moving each spout/bolt task to one Heron Instance(a JVM Process) instead of bundling multiple tasks to one JMV(how storm used to do it).

This Approach Really helps with debugging Topologies. But my question is, How can one attempt to debug core parts of heron like schedulers or resource management parts. Is there a way to do that other than logging/printing outputs? Because this is a Really Time & Energy Consuming Process. Is there a way to use a tool like an IDE(for example IntelliJ) to set some checkpoints and debug the whole process of scheduling tasks in heron?

Thanks in advance.


Solution

  • After struggling with this problem for a long time, I've finally found an answer, with help of Heron developers(Hat off to them). The answer is Remote Debugging jvm processes.

    A troubleshooting section(See Debugging Java topologies at this page) has been added to Heron documentation that gives the required instructions for remote debugging heron. It's good but not what i needed, because it's only for debugging instances(bolts/spouts...). But i needed to debug core parts like scheduler, launcher etc.

    To enable full remote Debugging for heron you should add second line to the execute.py(can be found at heron/tools/cli/src/python) file:

    java_opts = ['-D' + opt for opt in java_defines]
    //add this line here
    java_opts.append('-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005')
    

    address: the port that you've configured in your IDE.

    suspend: y means suspend execution until client(debugger or IDE) connects to server(jvm process).

    You can find instructions to set Remote Debugging in Intellij in this link.

    Important: Don't forget to Recompile source and install bin packages. Compile and install heron

    bazel build  --config=ubuntu heron/...
    bazel run --config=ubuntu -- scripts/packages:heron-client-install.sh --user
    bazel run --config=ubuntu -- scripts/packages:heron-api-install.sh --user --maven
    

    Now place your checkpoints wherever you want and submit your topologiy from terminal and then start debugging in IDE and it will take you to the checkpoints. Just remember to add checkpoints to the Path of execution, SubmitterMain or SchedulerMain can be good candidates.