Search code examples
rmatrixdatatablemarkov-chains

How to expand transition table to include all possible combinations, even those unused combinations?


I'm trying out the table() function in R for running transition matrices, and so far I'm finding it easy to use and very quick compared to other options I've been playing with such as for-loops and lapply().

However I'd like to expand the output table to include all possible combinations even if they don't manifest in the data, populating those "unused" fields with 0's. Is there a quick/easy way to do this?

When I run the following example data frame through the table() function, both as shown below:

  ID Period Balance Flags
1 10      1       5   X00
2 10      2      10   X01
3 10      3      15   X00
4 11      1       0   X01
5 11      2       2   X02
6 11      3       4   X02
7 15      1       3   X02
8 15      2       6   X01
9 15      3       2   X00

dataTest <- 
  data.frame(
    ID = c(10,10,10,11,11,11,15,15,15),
    Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 2),
    Flags = c("X00","X01","X00","X01","X02","X02","X02","X01","X00")
  )

table(dataTest[dataTest$Period == 3,]$Flags, dataTest[dataTest$Period == 1,]$Flags)

I get the following results (which is correct):

      X00 X01 X02
  X00   1   0   1
  X02   0   1   0

However I would like to expand those results to include the unused combinations, so it looks like this:

      X00 X01 X02
  X00   1   0   1
  X01   0   0   0
  X02   0   1   0

This is a transition table, showing along the columns the state the elements started in (in Period = 1 per this example user input) and showing along the rows the state the elements ended in (in Period = 3 per this example user input).


Solution

  • library(tidyverse)
    
    dataTest <-
      data.frame(
        ID = c(10, 10, 10, 11, 11, 11, 15, 15, 15),
        Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
        Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 2),
        Flags = c("X00", "X01", "X00", "X01", "X02", "X02", "X02", "X01", "X00")
      )
    
    table(
      dataTest[dataTest$Period == 3, ]$Flags,
      dataTest[dataTest$Period == 1, ]$Flags
    ) %>%
      as.data.frame() %>%
      mutate(
        Var1 = Var1 %>% factor(levels = unique(dataTest$Flags)),
        Var2 = Var2 %>% factor(levels = unique(dataTest$Flags)),
      ) %>%
      complete(Var1, Var2, fill = list(Freq = 0)) %>%
      pivot_wider(names_from = Var2, values_from = Freq)
    #> # A tibble: 3 × 4
    #>   Var1    X00   X01   X02
    #>   <fct> <int> <int> <int>
    #> 1 X00       1     0     1
    #> 2 X01       0     0     0
    #> 3 X02       0     1     0
    

    Created on 2022-03-11 by the reprex package (v2.0.0)