Throughout a Giraph graph, I need to maintain an array on a Vertex basis to store the results of several "health" checks done at the Vertex level.
If it as simple as writing a new Input format that will get carried over?
My worry goes to the fact that the actual data that will feed the graph does not need to know about this array.
You don’t need to read the data from anywhere, if the array is just there to keep temporary calculations between steps you don’t need to read, nor write it.
You will need to create a new class which implements Writable
. You’ll store the array within this class and take care of the serialisation/deserialization between the supersteps. This is done in the two functions:
@Override
public void write(DateOutput dataOutput) throws IOException {
. . . .
}
@Override
public void readFields(DataInput dataInput) throws IOException {
. . . .
}
Make sure, that you’ll read and write the fields in the same order, as they are written into a buffer and having different orders would screw up everything.
Afterwards you just need to specify this very class in the Generic type for the Vertex-Value-Type.
Although if you don’t initialize the VertexValue during the set-up process, when you read the input file, … you should do it in the first SuperStep (== 0
)
I’ve written a blog post about complex data types in Giraph about a year ago, maybe it will help you further, although some things might have had changed in the meanwhile.