Search code examples
graphrustconnected-componentspetgraph

Return all petgraph connected components in a hashtable


I'm using petgraph and I would like to extract connected components.

I want to have a HashMap<u32, Vec<&petgraph::graph::NodeIndex>> with a u32 as the identifier for the connected component and a Vec as a container with references to all nodes in the connected component.

If this is a bad design, don't hesitate to point out a better one; I am quite a Rust beginner.

I tried something like this:

extern crate fnv;
extern crate petgraph;

use petgraph::visit::Dfs;

use fnv::FnvHashMap; // a faster hash for small key
use fnv::FnvHashSet;


// structure definition
pub struct NodeAttr {
    pub name_real: String,
}

impl Default for NodeAttr {
    fn default() -> Self {
        NodeAttr {
            name_real: "default_name_for_testing".to_string(),
        }
    }
}


pub struct EdgesAttr {
    pub eval: f64,
    pub pid: f32,
    pub cov: f32, // minimum coverage
}

impl Default for EdgesAttr {
    fn default() -> Self {
        EdgesAttr {
            eval: 0.0,
            pid: 100.0,
            cov: 100.0,
        }
    }
}

pub fn cc_dfs<'a>(
    myGraph: &petgraph::Graph<NodeAttr, EdgesAttr, petgraph::Undirected>,
) -> FnvHashMap<u32, Vec<&'a petgraph::graph::NodeIndex>> {
    let mut already_visited = FnvHashSet::<&petgraph::graph::NodeIndex>::default();
    let mut map_real_index: FnvHashMap<u32, Vec<&petgraph::graph::NodeIndex>> =
        FnvHashMap::with_capacity_and_hasher(myGraph.node_count(), Default::default());

    let mut cpt = 0;

    for current_node_indice in myGraph.node_indices() {
        let mut current_vec: Vec<&petgraph::graph::NodeIndex> = Vec::new();
        if already_visited.contains(&current_node_indice) {
            continue;
        }
        let mut dfs = Dfs::new(&myGraph, current_node_indice);
        while let Some(nx) = dfs.next(&myGraph) {
            // the problem is around here
            // I believe the just assigned nx live only for the while
            //But it should live for the upper for loop. What to do?
            current_vec.push(&nx);
            already_visited.insert(&nx);
        }
        map_real_index.insert(cpt, current_vec);
        cpt = cpt + 1
    }

    return map_real_index;
}

fn main() {}

Cargo.toml:

enter[dependencies]
fnv="*"
petgraph="*" 

With the compiler error:

error[E0597]: `nx` does not live long enough
  --> src/main.rs:59:31
   |
59 |             current_vec.push(&nx);
   |                               ^^ does not live long enough
60 |             already_visited.insert(&nx);
61 |         }
   |         - borrowed value only lives until here
   |
note: borrowed value must be valid for the lifetime 'a as defined on the function body at 40:1...
  --> src/main.rs:40:1
   |
40 | / pub fn cc_dfs<'a>(
41 | |     myGraph: &petgraph::Graph<NodeAttr, EdgesAttr, petgraph::Undirected>,
42 | | ) -> FnvHashMap<u32, Vec<&'a petgraph::graph::NodeIndex>> {
43 | |     let mut already_visited = FnvHashSet::<&petgraph::graph::NodeIndex>::default();
...  |
66 | |     return map_real_index;
67 | | }
   | |_^

error[E0597]: `nx` does not live long enough
  --> src/main.rs:61:9
   |
60 |             already_visited.insert(&nx);
   |                                     -- borrow occurs here
61 |         }
   |         ^ `nx` dropped here while still borrowed
...
67 | }
   | - borrowed value needs to live until here

I cloned the node index in my vector and that worked:

current_vec.push(nx.clone()); // instead of (&nx)
already_visited.insert(nx.clone());`

I believe (maybe wrongly) that working with references will be more effective than copying.


Solution

  • This much smaller piece of code exhibits the same problem (playground):

    let mut v = Vec::new(); // Vec<&'a NodeIndex> ... but what is 'a?
    for n in 0..10 {
        let nx: NodeIndex = NodeIndex::new(n);
        v.push(&nx);
    }
    

    i.e., you're creating a short-lived NodeIndex inside a loop and trying to store a reference to it in a longer-lived Vec.

    In this case the solution is very simple: just move the NodeIndex instead of taking a reference.

        v.push(nx)
    

    In your original code the fix is no different.

    // nit: "indices" is the plural of "index"; there is no singular word "indice"
    for current_node_index in myGraph.node_indices() {
        // actually you don't need to supply a type here, but if you did...
        let mut current_vec: Vec<petgraph::graph::NodeIndex> = Vec::new();
        if already_visited.contains(&current_node_index) {
            continue;
        }
        let mut dfs = Dfs::new(&myGraph, current_node_index);
        while let Some(nx) = dfs.next(&myGraph) {
            current_vec.push(nx);
            //               ^-----v- Look Ma, no &s!
            already_visited.insert(nx);
        }
        map_real_index.insert(cpt, current_vec);
        cpt = cpt + 1
    }
    

    "But," you say, "I don't want to copy an entire NodeIndex! I just want to have a pointer to it! NodeIndex is a big fat hairy struct, right?"

    Well, if that (an owning pointer) is really what you need, Box is what you almost always want. But first look at the definition of NodeIndex and check out the source code, if you want to know how heavyweight these indices really are:

    pub struct NodeIndex<Ix=DefaultIx>(Ix);
    

    A NodeIndex is just an Ix, which is (if you look up DefaultIx) just an alias for u32. On a 64 bit PC, that's actually smaller than the pointer you were trying to store, and in Rust, you don't pay any extra cost for using it -- at runtime, it's really just a u32.

    Conveniently, NodeIndex is Copy (when Ix is Copy), so you don't even need to throw that extra .clone() in; you can just do current_vec.push(nx) followed by already_visited.insert(nx) as I did above. (But even if you do write .clone(), you pay no runtime cost for doing so; it's just unnecessary.)