Search code examples
rmatrixdata-manipulationdata-conversionranking

Expand a matrix of rankings (1 ~ 4) to a bigger binary matrix


I have a matrix which I want to convert to one with binary output (0 vs 1). The matrix to be converted contains four rows of rankings (1 to 4):

mat1.data <- c(4,   3,  3,  3,  3,  2,  2,  1,  1,  1,
               3,   4,  2,  4,  2,  3,  1,  3,  3,  2,
               2,   2,  4,  1,  1,  1,  4,  4,  2,  4,
               1,   1,  1,  2,  4,  4,  3,  2,  4,  3)
mat1 <- matrix(mat1.data,nrow=4,ncol=10,byrow=TRUE)
mat1
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    4    3    3    3    3    2    2    1    1     1
[2,]    3    4    2    4    2    3    1    3    3     2
[3,]    2    2    4    1    1    1    4    4    2     4
[4,]    1    1    1    2    4    4    3    2    4     3

For each row in the input matrix, I want to create four binary rows - one row for each value of the ranks (1-4). In the binary matrix, each row-wise entry is 1 on positions where the focal rank occurs in the input matrix, and 0 otherwise. Each row from the original matrix should produce 10*4=40 entries in the output matrix.

For example, for the first row of in the input matrix...

4   3   3   3   3   2   2   1   1   1

...the output should be:

0   0   0   0   0   0   0   1   1   1 # Rank 1 in input
0   0   0   0   0   1   1   0   0   0 # Rank 2 in input
0   1   1   1   1   0   0   0   0   0 # Rank 3 in input
1   0   0   0   0   0   0   0   0   0 # Rank 4 in input

Continue with this process, the expected output for all four rows of rankings should look like this:

0   0   0   0   0   0   0   1   1   1 #first row of rankings starts
0   0   0   0   0   1   1   0   0   0
0   1   1   1   1   0   0   0   0   0
1   0   0   0   0   0   0   0   0   0 #first row of rankings ends
0   0   0   0   0   0   1   0   0   0 #second row of rankings starts
0   0   1   0   1   0   0   0   0   1
1   0   0   0   0   1   0   1   1   0
0   1   0   1   0   0   0   0   0   0 #second row of rankings ends
0   0   0   1   1   1   0   0   0   0 #third row of rankings starts
1   1   0   0   0   0   0   0   1   0
0   0   0   0   0   0   0   0   0   0
0   0   1   0   0   0   1   1   0   1 #third row of rankings ends
1   1   1   0   0   0   0   0   0   0 #fourth row of rankings starts
0   0   0   1   0   0   0   1   0   0
0   0   0   0   0   0   1   0   0   1
0   0   0   0   1   1   0   0   1   0 #fourth row of rankings ends

How do I do achieve this? I have a larger dataset and so a more efficient method is preferred but any help will be greatly appreciated!


Solution

  • matrix(sapply(mat1, \(i) replace(numeric(4), i, 1)), ncol = ncol(mat1))
    #      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    # [1,]    0    0    0    0    0    0    0    1    1     1
    # [2,]    0    0    0    0    0    1    1    0    0     0
    # [3,]    0    1    1    1    1    0    0    0    0     0
    # [4,]    1    0    0    0    0    0    0    0    0     0
    # [5,]    0    0    0    0    0    0    1    0    0     0
    # [6,]    0    0    1    0    1    0    0    0    0     1
    # [7,]    1    0    0    0    0    1    0    1    1     0
    # [8,]    0    1    0    1    0    0    0    0    0     0
    # [9,]    0    0    0    1    1    1    0    0    0     0
    #[10,]    1    1    0    0    0    0    0    0    1     0
    #[11,]    0    0    0    0    0    0    0    0    0     0
    #[12,]    0    0    1    0    0    0    1    1    0     1
    #[13,]    1    1    1    0    0    0    0    0    0     0
    #[14,]    0    0    0    1    0    0    0    1    0     0
    #[15,]    0    0    0    0    0    0    1    0    0     1
    #[16,]    0    0    0    0    1    1    0    0    1     0
    

    It takes 2 steps, and piping syntax may look clearer:

    sapply(mat1, \(i) replace(numeric(4), i, 1)) |>  ## each value to binary vector
      matrix(ncol = ncol(mat1))  ## reshape
    

    Actually, I don't need that anonymous function \(i). I can pass replace, and its arguments, to sapply directly.

    matrix(sapply(mat1, replace, x = numeric(4), values = 1), ncol = ncol(mat1))
    
    sapply(mat1, replace, x = numeric(4), values = 1) |> matrix(ncol = ncol(mat1))
    

    Misc

    user20650 and I discussed a little bit in comments, and here is a "vectorized" approach using outer:

    matrix(+outer(1:4, c(mat1), "=="), ncol = ncol(mat1))
    

    Henrik's answer is a more memory-efficient "vectorized" approach, but it over-complicates the index computation. Here is something simpler:

    out <- matrix(0, nrow(mat1) * 4, ncol(mat1))
    pos1 <- seq(0, length(mat1) - 1) * 4 + c(mat1)
    out[pos1] <- 1
    

    All methods so far create a dense output matrix. This is OK because the percentage of nonzero elements is 25%, which is not typically sparse. However, in case we want a sparse one, it is also straightforward:

    ## in fact, this is what Henrik aims to compute
    ij <- arrayInd(pos1, c(4 * nrow(mat1), ncol(mat1)))
    ## sparse matrix
    Matrix::sparseMatrix(i = ij[, 1], j = ij[, 2], x = rep(1, length(mat1)))
    #16 x 10 sparse Matrix of class "dgCMatrix"
    #                         
    # [1,] . . . . . . . 1 1 1
    # [2,] . . . . . 1 1 . . .
    # [3,] . 1 1 1 1 . . . . .
    # [4,] 1 . . . . . . . . .
    # [5,] . . . . . . 1 . . .
    # [6,] . . 1 . 1 . . . . 1
    # [7,] 1 . . . . 1 . 1 1 .
    # [8,] . 1 . 1 . . . . . .
    # [9,] . . . 1 1 1 . . . .
    #[10,] 1 1 . . . . . . 1 .
    #[11,] . . . . . . . . . .
    #[12,] . . 1 . . . 1 1 . 1
    #[13,] 1 1 1 . . . . . . .
    #[14,] . . . 1 . . . 1 . .
    #[15,] . . . . . . 1 . . 1
    #[16,] . . . . 1 1 . . 1 .