Search code examples
rigraphsnastatnet

Identifying and summarizing discrete groups of nodes in R


I am working on a networking problem related to family/household composition. I have multiple edge tables containing id1, id2 and a relationship code to state the type of relationship between the identity variables. These tables are large, upwards of 7 million rows in each. I also have a node table which contains the same id and various attributes.

What I want to achieve is an adjacency matrix which will give summary statistics similar to something like this:

                      Children

             1  2  3  4   total 
            --------------------
          1 | 1  0  1  0    2
            |
 Adults   2 | 3  5  4  1    13  
            |
          3 | 1  2  0  0    3
            |
      total | 5  7  5  1    18 

Essentially I want to be able to identify and count distinct networks in my data.

My data is in the form:

             ID1  ID2   Relationship_Code

              X1   X2    Married 
              X1   X3    Parent/Child
              X1   X4    Parent/Child 
              X5   X6    Married
              X5   X7    Parent/Child 
              X6   X5    Married
               .    .     .
               .    .     .
               .    .     . 

I also have a node table which contains date of birth and other variables from which adult/child status can be identified.

Any tips/hints on how to extract this summary information from the graph data frame would be very helpful and much appreciated.

Thanks


Solution

  • Some of the work that is required to get the final table that you want requires access to the node table which you are not showing us, but I can get you pretty far along in your problem.

    I think that the key to getting your result is identifying the households. You can do this in igraph using components. The connected components are households. I will illustrate with a slightly more elaborate version of your example.

    Data:

    Census = read.table(text="ID1  ID2   Relationship_Code
                  X1   X2    Married 
                  X2   X1    Married 
                  X1   X3    Parent/Child
                  X1   X4    Parent/Child 
                  X2   X3    Parent/Child
                  X2   X4    Parent/Child 
                  X5   X6    Married
                  X5   X7    Parent/Child 
                  X6   X7    Parent/Child 
                  X6   X5    Married
                  X8   X9    Married
                  X9   X8    Married",
        header=T)
    

    Now turn it into a graph, find the components and check by plotting.

    library(igraph)
    EL = as.matrix(Census[,1:2])
    Pop = graph_from_edgelist(EL)
    Households = components(Pop)
    plot(Pop, vertex.color=rainbow(3, alpha=0.5)[Households$membership])
    

    Household network

    You said that you could label the nodes as to whether they represent adults or children. I will assume that we have such a labeling. From that, it is easy to count the number of adults by household and children by household and to make a table of household decomposition by adults and children.

    V(Pop)$AdultChild = c('A', 'A', 'C', 'C', 'A', 'A', 'C', 'A', 'A')
    AdultsByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership), 
        function(p) sum(p=='A'))
    AdultsByHousehold
      Group.1 x
    1       1 2
    2       2 2
    3       3 2
    
    ChildrenByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership), 
        function(p) sum(p=='C'))
    ChildrenByHousehold
      Group.1 x
    1       1 2
    2       2 1
    3       3 0
    
    table(AdultsByHousehold$x, ChildrenByHousehold$x)
        0 1 2
      2 1 1 1
    

    In my bogus example, all households have two adults.