Search code examples
giraph

Which is better to use to manage Vertex state: POJO instance variables or Giraph values?


I'm confused about when it's OK to use Vertex instance variables to maintain state rather than proper Giraph values ala getValue(). An interesting example I found in the source demonstrates both: SimpleTriangleClosingVertex, which has both an instance variable (closeMap) and a custom vertex value (IntArrayListWritable). I'm a little surprised that using an instance variable is legit due to possibly screwing up serialization (?) My question: Is either valid? If so, how do I choose one over the other? Thanks very much.


Solution

  • The Compute classes in Giraph aren’t serialized. Giraph only serializes the value object which you recieve in the compute method in the vertex variable. You can create as many instance variables as you whish in order to make your function definitions easier, since they can access the instance variables and don’t need to get all the parameters passed but always consider the following two things:

    1. Only what is in the vertex value saved at the end of compute will be serialized and remembered until compute is run the next time
    2. Giraph reuses the compute-class objects. It creates a pool of objects and then calls the compute method with many different vertices on this object. So in the compute method, the first thing you need to do is, to initialise your class, including setting all default values, so that you wouldn’t get wrong results based on some random vertex which has been computed before - in the SAME object instance.