I have a database that comprises cities divided into clusters for each year. In other words, I applied a community detection algorithm for different databases containing cities in different years base on modularity. The final database (a mock example) looks like this:
v1 city cluster year
0 "city1" 0 2000
1 "city2" 2. 2000
2 "city3" 1. 2000
3 "city4" 0 2000
4 "city5" 2 2000
0 "city1" 2 2001
1 "city2" 1 2001
2 "city3" 0 2001
3 "city4" 0 2001
4 "city5" 0 2001
0 "city1" 1 2002
1 "city2" 2 2002
2 "city3" 0 2002
3 "city4" 0 2002
4 "city5" 1 2002
Now what would like to do is counting how many times a city ends up in the same cluster as another city each year. So in the mock example above I should end up with a 5 times 5 symmetric matrix where rows and columns are cities where each entry represent the number of times that city I and j are in the same cluster (independently of which cluster) in all years:
city1 city2 city3 city4 city5
city1 . 0. 0. 1. 1
city2. 0. . 0. 0. 1
city3. 0. 0. . 2. 1
city4. 1. 0. 2 . 1.
city5. 1. 1 1. 1. .
I am working in python but it's fine even if the solution is in matlab or R.
Thank you
In R, co-occurrence matrices are computed straightforwardly with table
and [t]crossprod
. We can compute the matrices by year and take the sum, like so:
con <- textConnection('
v1 city cluster year
0 "city1" 0 2000
1 "city2" 2 2000
2 "city3" 1 2000
3 "city4" 0 2000
4 "city5" 2 2000
0 "city1" 2 2001
1 "city2" 1 2001
2 "city3" 0 2001
3 "city4" 0 2001
4 "city5" 0 2001
0 "city1" 1 2002
1 "city2" 2 2002
2 "city3" 0 2002
3 "city4" 0 2002
4 "city5" 1 2002
')
d <- read.table(con, header = TRUE)
close(con)
x <- with(d, Reduce(`+`, apply(table(city, cluster, year), 3L, tcrossprod, simplify = FALSE)))
x
city
city city1 city2 city3 city4 city5
city1 3 0 0 1 1
city2 0 3 0 0 1
city3 0 0 3 2 1
city4 1 0 2 3 1
city5 1 1 1 1 3
There are threes on the diagonal because cities match themselves every year. If you prefer, say, zeros on the diagonal, then you can add:
diag(x) <- 0
If you don't like the redundant annotation with "city", then you can add:
dimnames(x) <- unname(dimnames(x))
And if you want to store the result as a formally symmetric, formally sparse matrix, then you can add:
library(Matrix)
x <- as(x, "CsparseMatrix")
x
5 x 5 sparse Matrix of class "dsCMatrix"
city1 city2 city3 city4 city5
city1 . . . 1 1
city2 . . . . 1
city3 . . . 2 1
city4 1 . 2 . 1
city5 1 1 1 1 .