how to select submatrix in a (adjacency) matrix based on ties, in R

I have a matrix, which represents mobility between various jobs:

 jobdat <- matrix(c(
           295,  20,   0,    0,    0,    5,    7,
           45,   3309, 15,   0,    0,    0,    3,
           23,   221,  2029, 5,    0,    0,    0,
           0,    0,    10,   100,  8,    0,    3,
           0,    0,    0,    0,    109,  4,    4,
           0,    0,    0,    0,    4,    375,  38,
           0,    18,   0,    0,    4,    26,   260), 
           nrow = 7, ncol = 7, byrow = TRUE,
           dimnames = list(c("job 1","job 2","job 3","job 4","job 5","job 6","job 7"),
                c("job 1","job 2","job 3","job 4","job 5","job 6","job 7")))

This is treated as a directed, weighted adjacency matrix in a social network analysis. The direction of the network is from rows to columns: So mobility is defined as going from a job-row to a job-column. The diagonal is meaningful, since it is possible to change to the same job in another firm.

For part of my analysis I want to select a submatrix which consists of job 1, job 5 and job 7:

work.list <- c(1,5,7)
jobpick_wrong <- jobdat[work.list,work.list]

however, this only gives the direct ties between these three jobs. What I need is this:

jobpick_right <- matrix(c(
          295,  20,   0,    5,    7,
          45,   3309, 0,    0,    3,
          0,    0,    109,  4,    4,
          0,    0,    4,    375,  38,
          0,    18,   4,    26,   260),
          nrow = 5, ncol = 5, byrow = TRUE,
          dimnames = list(c("job 1","job 2","job 5","job 6","job 7"),
                    c("job 1","job 2","job 5","job 6","job 7")))

Here, job 2 and 6 are also included, since these two jobs also have direct ties to either job 1, 5 or 7. While job 3 and 4 are excluded, because they do not have any ties to job 1, 5 or 7.

I'm not sure how to go about this. Maybe I have to transform it into an igraph-object in order to get anywhere?

net           <- graph.adjacency(jobdat, mode = "directed", weighted = TRUE)

and then maybe use the ego/neighborhood-function, also from the igraph package? But how I'm really not sure how. Or if this is the best way to go about it.

Thank you for your time,

Emil Begtrup-Bright

Augmented question:

The answer by aichao is perfect for the question asked, although it turns out that another step is needed. When the work.list has been created that include the jobs that has ties to the three "jobs of interest", job 1, 5, 7 in this example. Then, with real data, the amount of clutter makes another step desirable: That only the direct ties to and from the three jobs of interest are kept, while ties between other jobs are set to zero.

The data above does not depict this in a very good way, so I have created a very version of the above to demonstrate this:

jobdat <- matrix(c(
1,   0,   1,   0,   0,   0,   0,
1,   1,   1,   0,   0,   0,   0,
1,   1,   1,   0,   0,   0,   0,
0,   0,   0,   1,   0,   0,   0,
0,   0,   0,   0,   1,   0,   0,
0,   0,   0,   0,   0,   1,   0,
0,   0,   0,   0,   0,   0,   1
           ), 
           nrow = 7, ncol = 7, byrow = TRUE,
           dimnames = list(c("job 1","job 2","job 3","job 4","job 5","job 6","job 7"),
                c("job 1","job 2","job 3","job 4","job 5","job 6","job 7")))

by using aichaos solution:

work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[x,] != 0)))))

then we get this:

jobdat[work.list,work.list]
#          job 1 job 2 job 3 job 5 job 7
#    job 1     1     0     1     0     0
#    job 2     1     1     1     0     0
#    job 3     1     1     1     0     0
#    job 5     0     0     0     1     0
#    job 7     0     0     0     0     1

However, the ties between job 2 and job 3 are irrelevant, and only serves to obscure the ties of interest.

jobdat.result <- matrix(c(
1,     0,     1,     0,     0,
1,     1,     0,     0,     0,
1,     0,     1,     0,     0,
0,     0,     0,     1,     0,
0,     0,     0,     0,     1
           ), 
           nrow = 5, ncol = 5, byrow = TRUE,
           dimnames = list(c("job 1","job 2","job 3","job 5","job 7"),
                c("job 1","job 2","job 3","job 5","job 7")))

in job.dat.result, the tie between job 3 and job 2 have been removed, both row-wise and col-wise, but the ties between these two jobs and the three jobs of interest are kept. Ideally, it should be possible to choose wether the diagonal of job 2 and job 3 should also be zero. But most likely, I'll set the diagonal to zero, for all jobs, so this is not required. But would be nice, if nothing else then for me to understand the logic of this at a higher level.

What I am trying to achieve, among other things, is circlegrams like this:

So simplicity in the number of ties is important. The diagram is reproduced like this:

library(circlize)
segmentcircle <- jobdat  
diag(segmentcircle) <- 0
df.c <- get.data.frame(graph.adjacency(segmentcircle,weighted=TRUE))
colour <-  brewer.pal(ncol(segmentcircle),"Set1")
chordDiagram(x = df.c, 
  grid.col = colour, 
  transparency = 0.2,
             directional = 1, symmetric=FALSE,
             direction.type = c("arrows", "diffHeight"), diffHeight  = -0.065,
             link.arr.type = "big.arrow", 
             # self.link=1
             link.sort = TRUE, link.largest.ontop = TRUE,
             link.border="black",
             # link.lwd = 2, 
             # link.lty = 2
             )

Solution

Assuming your directed graph is from rows to columns, what you can do is to augment your work.list with those columns that are connected (with element !=0) to each row in the work.list. You can do this by:

work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[x,] != 0)))))

Use unique to keep only the unique columns assembled and sort so that these columns are sorted by their indices. Then:

jobdat[work.list,work.list]
##      job 1 job 2 job 5 job 6 job 7
##job 1   295    20     0     5     7
##job 2    45  3309     0     0     3
##job 5     0     0   109     4     4
##job 6     0     0     4   375    38
##job 7     0    18     4    26   260

If instead, your directed graph is from columns to rows:

work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[,x] != 0)))))

Updated for augmented question:

With the new jobdat:

jobdat <- matrix(c(
  1,   0,   1,   0,   0,   0,   0,
  1,   1,   1,   0,   0,   0,   0,
  1,   1,   1,   0,   0,   0,   0,
  0,   0,   0,   1,   0,   0,   0,
  0,   0,   0,   0,   1,   0,   0,
  0,   0,   0,   0,   0,   1,   0,
  0,   0,   0,   0,   0,   0,   1
), 
nrow = 7, ncol = 7, byrow = TRUE,
dimnames = list(c("job 1","job 2","job 3","job 4","job 5","job 6","job 7"),
                c("job 1","job 2","job 3","job 4","job 5","job 6","job 7")))

and the list of relevant jobs in work.list:

work.list <- c(1,5,7)

Compute the augmented work list aug.work.list as the collection of jobs that goes directly to the relevant jobs in the work.list. This will include jobs 2 and 3. Note that we use which(jobdat[,x] != 0) instead of which(jobdat[x,] != 0) here to identify the job (either relevant or irrelevant) that connects to the relevant job x in the work.list.

aug.work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[,x] != 0)))))
##[1] 1 2 3 5 7

This results in:

jobdat.result <- jobdat[aug.work.list, aug.work.list]
##      job 1 job 2 job 3 job 5 job 7
##job 1     1     0     1     0     0
##job 2     1     1     1     0     0
##job 3     1     1     1     0     0
##job 5     0     0     0     1     0
##job 7     0     0     0     0     1

Now, to remove the connections between irrelevant jobs, first find the indices for these irrelevant jobs in jobdat.result, which are indices of elements in aug.work.list that are not in work.list

irrelevant.job.indices <- which(!(aug.work.list %in% work.list))
##[1] 2 3

Note that these are not job numbers for the irrelevant jobs but the (row and column) indices in jobdat.result corresponding to the irrelevant job numbers. In this case, they just happen to correspond to the job numbers themselves.

Removing the connections require setting the off-diagonals for the sub-matrix of jobdat.result indexed by irrelevant.job.indices to 0. To do this:

## first, keep diagonal values for irrelevant.job.indices
dvals <- diag(jobdat.result)[irrelevant.job.indices]
## set sub-matrix to zero (this will also set diagnal elements to zero)
jobdat.result[irrelevant.job.indices,irrelevant.job.indices] <- 0
## replace diagonal elements
diag(jobdat.result)[irrelevant.job.indices] <- dvals

The result is:

jobdat.result
##      job 1 job 2 job 3 job 5 job 7
##job 1     1     0     1     0     0
##job 2     1     1     0     0     0
##job 3     1     0     1     0     0
##job 5     0     0     0     1     0
##job 7     0     0     0     0     1