Search code examples
javaapache-sparkserializabledefaultmutabletreenode

DefaultMutableTreeNode value set to be default when use it to Spark mapToPair


I have a question that type DefaultMutableTreeNode variable's value turns to default once use it in Spark mapToPair() function. Here is my code:

public class CA implements Serializable{
    private final JavaRDD<String> input;
    private final List<IB> bList;
    public boolean FuncWithSpark(){
    /* 
    !!!at this point, bList.get(0).getD().getRoot() return a valid tree node
    */
    JavaRDD<Boolean> counters = input.mapToPair(new PairFunction<String, String, List<String>>() {
          @Override
          public Tuple2<String, List<String>> call(String s) throws Exception {
              /* 
              !!!at this point, bList.get(0).getD().getRoot() return an uninitialized tree node with default values
              */
              ...
          }
      }
    }

    public CA(JavaRDD<String> input, List<IB> bList) {
        this.input = input;
        this.bList = bList;
  }
}

Interfaces IB, ID, classes CB and CD are defined like:

public interface IB {
  ...
}
public interface ID {
  ...
}

public class CB implements IB, Serializable{
    private final ID d;
    public ID getD(){
        return this.d;
    }
}
public class CD implements ID, Serializable{
    private DefaultMutableTreeNode rootNode;

    public DefaultMutableTreeNode getRoot(){
      return this.rootNode;
    }
}

Question is, what happened to the variable of type DefaultMutableTreeNode in CA.FuncWithSpark()? Is it because of Spark transformation, or DefaultMutableTreeNode's member variables are protected and no accessor to them? Please give me a direction to tackle this problem. Thank you for any help in advance!.


Solution

  • Since I am new to Apache Spark and it was my first time to use DefaultMutableTreeNode class, I can't explain the root cause but I find a way to make my code work. The document of DefaultMutableTreeNode mentions This is not a thread safe class, which makes me think in Spark, passing variables of type that is thread unsafe from driver to executors may fail to pass values correctly.

    However, my project needs a data structure like tree node, so I found this generic tree node implementation on stackoverflow to replace DefaultMutableTreeNode. Now my code works well.