I am working on a networking problem related to family/household composition. I have multiple edge tables containing id1, id2 and a relationship code to state the type of relationship between the identity variables. These tables are large, upwards of 7 million rows in each. I also have a node table which contains the same id and various attributes.
What I want to achieve is an adjacency matrix which will give summary statistics similar to something like this:
Children
1 2 3 4 total
--------------------
1 | 1 0 1 0 2
|
Adults 2 | 3 5 4 1 13
|
3 | 1 2 0 0 3
|
total | 5 7 5 1 18
Essentially I want to be able to identify and count distinct networks in my data.
My data is in the form:
ID1 ID2 Relationship_Code
X1 X2 Married
X1 X3 Parent/Child
X1 X4 Parent/Child
X5 X6 Married
X5 X7 Parent/Child
X6 X5 Married
. . .
. . .
. . .
I also have a node table which contains date of birth and other variables from which adult/child status can be identified.
Any tips/hints on how to extract this summary information from the graph data frame would be very helpful and much appreciated.
Thanks
Some of the work that is required to get the final table that you want requires access to the node table which you are not showing us, but I can get you pretty far along in your problem.
I think that the key to getting your result is identifying the households.
You can do this in igraph
using components
. The connected components are households.
I will illustrate with a slightly more elaborate version of your example.
Data:
Census = read.table(text="ID1 ID2 Relationship_Code
X1 X2 Married
X2 X1 Married
X1 X3 Parent/Child
X1 X4 Parent/Child
X2 X3 Parent/Child
X2 X4 Parent/Child
X5 X6 Married
X5 X7 Parent/Child
X6 X7 Parent/Child
X6 X5 Married
X8 X9 Married
X9 X8 Married",
header=T)
Now turn it into a graph, find the components and check by plotting.
library(igraph)
EL = as.matrix(Census[,1:2])
Pop = graph_from_edgelist(EL)
Households = components(Pop)
plot(Pop, vertex.color=rainbow(3, alpha=0.5)[Households$membership])
You said that you could label the nodes as to whether they represent adults or children. I will assume that we have such a labeling. From that, it is easy to count the number of adults by household and children by household and to make a table of household decomposition by adults and children.
V(Pop)$AdultChild = c('A', 'A', 'C', 'C', 'A', 'A', 'C', 'A', 'A')
AdultsByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership),
function(p) sum(p=='A'))
AdultsByHousehold
Group.1 x
1 1 2
2 2 2
3 3 2
ChildrenByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership),
function(p) sum(p=='C'))
ChildrenByHousehold
Group.1 x
1 1 2
2 2 1
3 3 0
table(AdultsByHousehold$x, ChildrenByHousehold$x)
0 1 2
2 1 1 1
In my bogus example, all households have two adults.