Search code examples
rcontingency

Efficient way to create a Winner/Looser Contingency-Table


I'm relatively new to R and need some help.

I want to generate a 2x2 contingency table and always compare two consecutive days as follow: you start with the first row and take the median of the values, in my example in the row 2019-02-11 it would be the value 15 from x4. Now all values higher than the median are "winners" and and values below are "losers".

In the second step you do the same for the row 2019-02-12 and if x1 for example is in both periods a "winner" it should count in a field "winner/winner" in the contingency table. Same for "loser/loser", "winner/loser", "loser/winner", and so on for all consecutive dates.

I have a data frame with around 200 columns and therefore looking for a effective way to do this.

My code looks like this:

            set.seed(123)
            d <- data.frame(Time = rep(seq.Date( Sys.Date(), length=30, by="day" )),
            x1 = rep(sample(10:30, 10), 3),
            x2 = rep(sample(10:30, 10), 3),
            x3 = rep(sample(10:30, 10), 3),
            x4 = rep(sample(10:30, 10), 3),
            x5 = rep(sample(10:30, 10), 3))

Thanks a lot.


Solution

  • With a bit of arithmetic I think we can do this quite efficiently.

    First we find the winners and losers and assign them a 0 or a 1 accordingly. Next we can do columnwise differencing to find out wether two consecutive days were lose/win (1) or win lose (-1). As both win/win and lose/lose will result in a difference of zero, we'l also have to check what the first value was. The rest is just recoding and assembling.

    d <- structure(list(Time=structure(17942:17947, class="Date"),
    x1=c(NA, NA, 17L, 29L, 27L, 10L), x2=c(30L, 19L, 22L, 20L, 11L,
    24L), x3=c(NA, 23L, 22L, 27L, 21L, 26L), x4=c(30L, 28L, 23L,
    24L, 10L, 17L), x5=c(12L, 18L, 17L, 16L, 30L, 26L)),
    row.names=c(NA, 6L), class="data.frame")
    
    x <- t(apply(d[,-1], 1, function(x) x > median(x, na.rm=TRUE)))
    nr <- nrow(x)
    dx <- diff(x)
    
    lw <- (dx == 1)*1
    wl <- (dx == -1)*2
    dd <- (dx == 0)
    ww <- (dd & x[-nr,] == 1)*3
    ll <- (dd & x[-nr,] == 0)*4
    
    tab <- c("lose/win", "win/lose", "win/win", "lose/lose")[lw + wl + ww + ll]
    
    d0 <- d
    d0[-1,-1] <- tab
    d0
    
    #         Time       x1        x2       x3        x4        x5
    # 1 2019-02-15     <NA>        30     <NA>        30        12
    # 2 2019-02-16     <NA> lose/lose     <NA>  lose/win lose/lose
    # 3 2019-02-17     <NA> lose/lose win/lose   win/win lose/lose
    # 4 2019-02-18 lose/win lose/lose lose/win  win/lose lose/lose
    # 5 2019-02-19  win/win lose/lose win/lose lose/lose  lose/win
    # 6 2019-02-20 win/lose lose/lose lose/win lose/lose   win/win