Search code examples
algorithmdata-structuresgraph32-bitlarge-data

Searching a graph larger than 4GB on a 32bit machine


I have a graph that is somewhat larger than 4GB, larger than I can address on a 32bit machine.

I want to write a program that counts the number of vertices connected (directly and indirectly) to a specific node in the graph.

How can I do it if I can't load the whole graph into memory/swap at once?

edit: It is a directed graph, and I actually want to count the number of vertices from which I can get (in the right direction) to a specific vertex.


Solution

  • It really depends on how you store your graph. In many cases you don't have to load the whole graph. For example, if your graph is stored as an adjacency matrix, you just need to extract one column/row, and there are many ways to do this efficiently depending on how the matrix is stored (for example fixed-length cells or HDF files). If your graph is stored as a list of nodes with pointers to other nodes, you can sometimes extract just the data for that node. Bottom line, it depends on how you store your graph, but it is doable under many representations. The key point is that the representation has to be saved to disk in such a way that it does not have to be loaded all at once.