Search code examples
c++recursionserializationbooststack-overflow

Boost serialization with recursive data structure ends up in stack overflow


Problem

I'm having trouble using Boost Serialization library with a recursive data structure. More precisely, I want to serialize a matrix which is represented by nodes containing a value, and where each node has access to its neighbors (top,bottom,left,right). In order to access a node, each entry point is stored in a vector (that is the first and the last node of each row and and each column) . Here is the Node class :

class Node
{
private:
    int v;
    Node* left; 
    Node* right;
    Node* top;
    Node* bottom;

public:
    Node() : v(rand() % 100), left(NULL), right(NULL), top(NULL), bottom(NULL)
    {

    }

    //Potential memory leak but that's not the topic
    void setLeft(Node* toSet) { left = toSet; }
    void setRight(Node* toSet) { right = toSet; }
    void setTop(Node* toSet) { top = toSet; }
    void setBottom(Node* toSet) { bottom = toSet; }

    Node* gLeft() { return left; }
    Node* gRight() { return right; }
    Node* gTop() { return top; }
    Node* gBottom() { return bottom; }

    int gValue() { return v; }

    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& v;
        ar& left;
        ar& right;
        ar& top;
        ar& bottom;
    }
};

Here is the Matrix class, with a generateValues() function for this example :

class Matrix
{
private:
    int m, n;
    std::vector<Node*> firstNodesPerRow;
    std::vector<Node*> lastNodesPerRow;
    std::vector<Node*> firstNodesPerCol;
    std::vector<Node*> lastNodesPerCol;
public:
    Matrix(int m, int n) : 
        m(m), n(n),
        firstNodesPerRow(m, NULL), lastNodesPerRow(m,NULL),
        firstNodesPerCol(n, NULL),lastNodesPerCol(n, NULL)
    {
        
    }

    void generateValues()
    {
        for (int i = 0; i < m; i++)
        {
            for (int j = 0; j < n; j++)
            {
                Node* toWrite = new Node();
                if (i > 0)
                {
                    toWrite->setTop(lastNodesPerCol.at(j));
                    lastNodesPerCol.at(j)->setBottom(toWrite);
                    lastNodesPerCol.at(j) = toWrite;
                }
                else
                {
                    firstNodesPerCol.at(j) = toWrite;
                    lastNodesPerCol.at(j) = toWrite;
                }
                if (j > 0)
                {
                    toWrite->setLeft(lastNodesPerRow.at(i));
                    lastNodesPerRow.at(i)->setRight(toWrite);
                    lastNodesPerRow.at(i) = toWrite;
                }
                else
                {
                    firstNodesPerRow.at(i) = toWrite;
                    lastNodesPerRow.at(i) = toWrite;
                }
            }
        }
    }

    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& m;
        ar& n;
        ar& firstNodesPerRow;
        ar& firstNodesPerCol;
        ar& lastNodesPerRow;
        ar& lastNodesPerCol;
    }
};

So what I want to achieve is serializing and deserializing a Matrix. Here is my main function :

#include <cstdlib>
#include <sstream>
#include <vector>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/vector.hpp>

int main(int argc, char* argv[])
{
    int m = 10; int n = 10;
    Matrix toSerialize(m,n);
    toSerialize.generateValues();

    /*
        1) Serialize
    */
    std::ostringstream oss;
    boost::archive::text_oarchive oa(oss);
    oa << toSerialize;

    std::string serialiazedData = oss.str();

    /*
        2) Deserialize
    */
    Matrix result(m,n);
    std::stringstream serializedDataStream(serialiazedData);
    boost::archive::text_iarchive ia(serializedDataStream);
    ia >> result;

    return EXIT_SUCCESS;
}

The problem is the following : given a sufficiently large value m or n, main ends up with a stack-overflow exception. I know that's it's coming from the serialize method of Node, because in order to serialize a node, it needs to serialize the neighbors and so on... I found this topic which seems to be exactly the same question. The answer is interesting as a starting point but I'm having trouble to implement it because it's not sufficiently precise. What I understand is that to solve my problem, I need to :

  1. serialize the nodes in an iterative way, so that when it comes to the neighboors, the objects are already serialized and there is no stack-overflow;
  2. serialize the topology, which is represented in my case through the pointers top,bottom,right,left of Node

I'm having trouble to actually implement this solution, because the only way I can imagine to do point 1 implies to remove the serialization of top,bottom,right,left in the serialize method of Node, but then I can't achieve point 2?

Edit : I made a diagram of a matrix to help reading. Note that there is not necessarily m x n nodes in a m x n matrix.

Matrix example

Edit 2 : The solution I had in mind (not working, ends up with a stack-overflow when deserializing).

//Class Node
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& v;
}

//Class Matrix
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& m;
    ar& n;

    for (int i = 0; i < m; i++)
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            ar& current;
            current = current->gRight();
        }
    }

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;
}

Solution

The explanation of the solution is given in the post marked as the answer. Here is an implementation of this solution :

// class Node
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& v;
}

// some buffer struct
struct Neighbors
{
    Node* top;
    Node* bottom;
    Node* left;
    Node* right;

    template <typename Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& top;
        ar& bottom;
        ar& left;
        ar& right;
    }
};

//class Matrix
template<class Archive>
void save(Archive& ar, const unsigned int version) const
{
    std::map<Node*, Neighbors> neighborsPerNode;

    for (int i = 0; i < m; i++)
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            neighborsPerNode[current] = {
                current->gTop(),
                current->gBottom(),
                current->gLeft(),
                current->gRight(),
            };
            current = current->gRight();
        }
    }

    ar& neighborsPerNode;

    ar& m;
    ar& n;

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;
}
template<class Archive>
void load(Archive& ar, const unsigned int version)
{
    // Warning ALL the nodes are browsed 2 times :
    //  1 - to deserialize neighborsPerNode (below)
    //  2 - in the for loop, to create the links between the nodes
    
    std::map<Node*, Neighbors> neighborsPerNode;
    ar& neighborsPerNode;

    ar& m;
    ar& n;

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;

    for (int i = 0; i < m; i++)
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            Neighbors myNeighbors = neighborsPerNode.at(current);
            current->setTop(myNeighbors.top);
            current->setBottom(myNeighbors.bottom);
            current->setLeft(myNeighbors.left);
            current->setRight(myNeighbors.right);
            current = current->gRight();    
        }
    }
}
BOOST_SERIALIZATION_SPLIT_MEMBER()

Solution

  • Why not simply serialize all elements in order of the matrix and avoid the function call recursion entirely, e.g.:

    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& m;
        ar& n;
        for (int i = 0; i < m; ++i)
        {
            Node* node = firstNodesPerRow[i];
            for (int j = 0; j < n; ++j)
            {
                ar & node->gValue();
                node = node->gRight();
            }
        }
    }
    

    BTW This works in saving the matrix. I think you need to specialize the serialize function in save and load versions, because for the load version you need to:

    1. load n, m
    2. allocate all nodes and populate the node pointer vectors in matrix
    3. load all values in the same order as during saving
        template<class Archive>
        void save(Archive & ar, const unsigned int version) const
        {
            ...
        }
        template<class Archive>
        void load(Archive & ar, const unsigned int version)
        {
            ...
        }
        BOOST_SERIALIZATION_SPLIT_MEMBER()
    

    In the complicated case where nodes may be missing, the same idea applies. And you still need to split between save/load, adding allocation to load. But it takes a lot more bookkeeping to correctly save & load again.

    E.g., you could first iterate over all nodes as above but creating a map from each Nodes pointer value to an unique increasing ID number. When you save each node's value (as above row by row), also save each direction's node ID. When you load, you first make a empty map: ID -> node pointer. Then restore nodes again row by row, while restoring neighbour pointers as well from map. Of course whenever an ID is first encountered you need to allocate a new node.