Search code examples
rigraphcyclegirth

r igraph find all cycles


I have directed igraph and want to fetch all the cycles. girth function works but only returns the smallest cycle. Is there a way in R to fetch all the cycles in a graph of length greater then 3 (no vertex pointing to itself and loops)


Solution

  • It is not directly a function in igraph, but of course you can code it up. To find a cycle, you start at some node, go to some neighboring node and then find a simple path back to the original node. Since you did not provide any sample data, I will illustrate with a simple example.

    Sample data

    ## Sample graph
    library(igraph)
    set.seed(1234)
    g = erdos.renyi.game(7, 0.29, directed=TRUE)
    plot(g, edge.arrow.size=0.5)
    

    Sample Graph

    Finding Cycles

    Let me start with just one node and one neighbor. Node 2 connects to Node 4. So some cycles may look like 2 -> 4 -> (Nodes other than 2 or 4) -> 2. Let's get all of the paths like that.

    v1 = 2
    v2 = 4
    lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
    [[1]]
    [1] 2 4 2
    [[2]]
    [1] 2 4 3 5 7 6 2
    [[3]]
    [1] 2 4 7 6 2
    

    We see that there are three cycles starting at 2 with 4 as the second node. (I know that you said length greater than 3. I will come back to that.)

    Now we just need to do that for every node v1 and every neighbor v2 of v1.

    Cycles = NULL
    for(v1 in V(g)) {
        for(v2 in neighbors(g, v1, mode="out")) {
            Cycles = c(Cycles, 
                lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p)))
        }
    }
    

    This gives 17 cycles in the whole graph. There are two issues though that you may need to look at depending on how you want to use this. First, you said that you wanted cycles of length greater than 3, so I assume that you do not want the cycles that look like 2 -> 4 -> 2. These are easy to get rid of.

    LongCycles = Cycles[which(sapply(Cycles, length) > 3)]
    

    LongCycles has 13 cycles having eliminated the 4 short cycles

    2 -> 4 -> 2
    4 -> 2 -> 4
    6 -> 7 -> 6
    7 -> 6 -> 7
    

    But that list points out the other problem. There still are some that you cycles that you might think of as duplicates. For example:

    2 -> 7 -> 6 -> 2
    7 -> 6 -> 2 -> 7
    6 -> 2 -> 7 -> 6
    

    You might want to weed these out. To get just one copy of each cycle, you can always choose the vertex sequence that starts with the smallest vertex number. Thus,

    LongCycles[sapply(LongCycles, min) == sapply(LongCycles, `[`, 1)]
    [[1]]
    [1] 2 4 3 5 7 6 2
    [[2]]
    [1] 2 4 7 6 2
    [[3]]
    [1] 2 7 6 2
    

    This gives just the distinct cycles.


    Addition regarding efficiency and scalability

    I am providing a much more efficient version of the code that I originally provided. However, it is primarily for the purpose of arguing that, except for very simple graphs, you will not be able produce all cycles.

    Here is some more efficient code. It eliminates checking many cases that either cannot produce a cycle or will be eliminated as a redundant cycle. In order to make it easy to run the tests that I want, I made it into a function.

    ## More efficient version
    FindCycles = function(g) {
        Cycles = NULL
        for(v1 in V(g)) {
            if(degree(g, v1, mode="in") == 0) { next }
            GoodNeighbors = neighbors(g, v1, mode="out")
            GoodNeighbors = GoodNeighbors[GoodNeighbors > v1]
            for(v2 in GoodNeighbors) {
                TempCyc = lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
                TempCyc = TempCyc[which(sapply(TempCyc, length) > 3)]
              TempCyc = TempCyc[sapply(TempCyc, min) == sapply(TempCyc, `[`, 1)]
              Cycles  = c(Cycles, TempCyc)
            }
        }
        Cycles
    }
    

    However, except for very simple graphs, there is a combinatorial explosion of possible paths and so finding all possible cycles is completely impractical I will illustrate this with graphs much smaller than the one that you mention in the comments.

    First, I will start with some small graphs where the number of edges is approximately twice the number of vertices. Code to generate my examples is below but I want to focus on the number of cycles, so I will just start with the results.

    ## ecount ~ 2 * vcount
    Nodes   Edges   Cycles
    10   21    15
    20   41    18
    30   65    34
    40   87   424
    50  108  3433
    55  117 22956
    

    But you report that your data has approximately 5 times as many edges as vertices. Let's look at some examples like that.

    ## ecount ~ 5 * vcount
    Nodes  Edges    Cycles
    10      48        3511
    12      61       10513
    14      71      145745
    

    With this as the growth of the number of cycles, using 10K nodes with 50K edges seems to be out of the question. BTW, it took several minutes to compute the example with 14 vertices and 71 edges.

    For reproducibility, here is how I generated the above data.

    set.seed(1234)
    g10 = erdos.renyi.game(10, 0.2, directed=TRUE)
    ecount(g10)
    length(FindCycles(g10))
    
    set.seed(1234)
    g20 = erdos.renyi.game(20, 0.095 , directed=TRUE)
    ecount(g20)
    length(FindCycles(g20))
    
    set.seed(1234)
    g30 = erdos.renyi.game(30, 0.056 , directed=TRUE)
    ecount(g30)
    length(FindCycles(g30))
    
    set.seed(1234)
    g40 = erdos.renyi.game(40, 0.042 , directed=TRUE)
    ecount(g40)
    length(FindCycles(g40))
    
    set.seed(1234)
    g50 = erdos.renyi.game(50, 0.038 , directed=TRUE)
    ecount(g50)
    length(FindCycles(g50))
    
    set.seed(1234)
    g55 = erdos.renyi.game(55, 0.035 , directed=TRUE)
    ecount(g55)
    length(FindCycles(g55))
    
    ##########
    set.seed(1234)
    h10 = erdos.renyi.game(10, 0.55, directed=TRUE)
    ecount(h10)
    length(FindCycles(h10))
    
    set.seed(1234)
    h12 = erdos.renyi.game(12, 0.46, directed=TRUE)
    ecount(h12)
    length(FindCycles(h12))
    
    set.seed(1234)
    h14 = erdos.renyi.game(14, 0.39, directed=TRUE)
    ecount(h14)
    length(FindCycles(h14))