Search code examples
rmatrixreshapespreaddcast

Converting a vector into a matrix (in R)


I want to create a "n x 8" Matrix from a "n x 1" vector

-- Question: Why do I want to do this?

-- Answer: In order to matrix multiply this against an "8 x 8" markov chain probability transition matrix, and return an "n x 8" Matrix of the predicted states

-- Solution: I have solved this in Attempt 3 below - but want to know if there is a better way to resolve this (rather than using two transpose functions)?


R code

Create a dummy "n x 1" vector: (here we use n = 2)

> temp_vector <- c("state 4", "state 7")
> temp_vector
[1] "state 4" "state 7"

Expected Output:

NA NA NA TRUE NA NA NA NA
NA NA NA NA NA NA TRUE NA

Attempt 1: Convert to matrix:

> temp_matrix <- matrix(temp_vector, 
                ncol = 8, # there are 8 states
                nrow = length(temp_vector) # there are 10 rows in the vector
                )
> temp_matrix
     [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]      [,8]     
[1,] "state 4" "state 4" "state 4" "state 4" "state 4" "state 4" "state 4" "state 4"
[2,] "state 7" "state 7" "state 7" "state 7" "state 7" "state 7" "state 7" "state 7"

Attempt 1 FAIL: This is not ideal, I want a matrix with ONE entry per row, not EIGHT.


Attempt 2: Compare the stateSpace above with the matrix, to give a matrix made up of TRUE/FALSE:

> stateSpace <- c("state 1", "state 2", "state 3", "state 4", "state 5", "state 6", "state 7", "state 8")

> temp_matrix == stateSpace
     state 1 state 2 state 3 state 4 state 5 state 6 state 7 state 8
[1,]   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE
[2,]   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE

Attempt 2 FAIL: expected each row to have one TRUE and the rest FALSE

Reason: (I THINK) matrices are compared column-wise.


Looking into Attempt 2 further, on an element by element level this works:

> temp_matrix[1,1] == colnames(temp_matrix)[1]
state 1 
  FALSE 
> temp_matrix[1,2] == colnames(temp_matrix)[2]
state 2 
  FALSE 
> temp_matrix[1,3] == colnames(temp_matrix)[3]
state 3 
  FALSE 
> temp_matrix[1,4] == colnames(temp_matrix)[4]
state 4 
   TRUE 

Looking into Attempt 2 further, on a row by row level this works:

> temp_matrix[1,] == colnames(temp_matrix)[]
state 1 state 2 state 3 state 4 state 5 state 6 state 7 state 8 
  FALSE   FALSE   FALSE    TRUE   FALSE   FALSE   FALSE   FALSE 

> temp_matrix[2,] == colnames(temp_matrix)[]
state 1 state 2 state 3 state 4 state 5 state 6 state 7 state 8 
  FALSE   FALSE   FALSE   FALSE   FALSE   FALSE    TRUE   FALSE 

Attempt 3: after noting the above learnings of column wise comparison in R

> t(stateSpace == t(temp_matrix))
     state 1 state 2 state 3 state 4 state 5 state 6 state 7 state 8
[1,]    TRUE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE
[2,]   FALSE    TRUE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE

Attempt 3 SUCCESS: created this stackoverflow post to see if there is a better way to resolve this (rather than using two transpose functions)


Other options: dcast, reshape, spread; sadly did NOT work either.

I tried reshape():

reshape(temp_vector, direction = "wide")
> Error in data[, timevar] : incorrect number of dimensions

I tried spread():

library(tidyr)
spread(temp_vector, key = numbers, value = value)
> Error in UseMethod("spread_") : 
  no applicable method for 'spread_' applied to an object of class "factor"

Solution

  • Try this:

    > v <- c("state 4", "state 7")
    > states <- c("state 1", "state 2", "state 3", "state 4",
    +             "state 5", "state 6", "state 7", "state 8")
    > m <- matrix(states, byrow = TRUE, nrow = 2, ncol = 8)
    > m
    #      [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]      # [,8]     
    # [1,] "state 1" "state 2" "state 3" "state 4" "state 5" "state 6" "state 7" "state 8"
    # [2,] "state 1" "state 2" "state 3" "state 4" "state 5" "state 6" "state 7" "state 8"
    > v == m
    #       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
    # [1,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
    # [2,] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
    

    In R, a matrix is basically a vector under the hood. When m is created above, the matrix function "recycles" its argument spaces because it needs to create a matrix with 16 elements. In other words, the following two function calls produce the same result:

    > matrix(states, byrow = TRUE, nrow = 2, ncol = 8)
    > matrix(rep(states, 2), byrow = TRUE, nrow = 2, ncol = 8)
    

    Similarly, when v and m are compared for equality, v is recycled 8 times to produce a vector of length 16. In other words, the following two equality comparisons produce the same results:

    > v == m
    > rep(v, 8) == m
    

    You can think of the above two comparisons as happening between two vectors, where the matrix m is converted back into a vector by stacking the columns. You can use as.vector to see the vector that m corresponds to:

    > as.vector(m)
    #  [1] "state 1" "state 1" "state 2" "state 2" "state 3" "state 3" "state 4" "state 4" "state 5"
    # [10] "state 5" "state 6" "state 6" "state 7" "state 7" "state 8" "state 8"