Search code examples
rsortingrankbipartite

Rank/Order/Cluster/Sort Data Frame by Column Pattern


I have a data frame called df1 like so:

enter image description here

I wish to re-order the columns of df1 such they group by matching a column pattern. This means that any column that has all 1s in all rows groups to the far left. Then any column that has a value of 1 for row A and row B, but 0 for row C, should follow...and so on, like in df2 below.

enter image description here

If there are ties (and there are many in my dataset), it does not matter, as long as they group/cluster according to their column pattern. For example, in df2, if column ex2 & ex5 are switched in order, or [edit after comments below] column ex3 & ex6 are switched in order, it's also an acceptable solution for me...

I tried ranking by column sum at first, but that obviously doesn't work, bc the ties where the sum is 2 results in columns that shouldn't be adjacent "clustering". For example, I'll get ex3, ex6, ex3, ex3, ex6 in a "cluster", when I want all that look like ex3 together, and all that look like ex6 together. It should be ex3, ex3, ex3, ex6, ex6.

I thought to iterate through each column, and pattern match to the patterns of 1s and 0s I desire, but I'm really lost on how to pattern match against the whole column and not just a value in a column.

Code is below:

ex1 <- c(1,0,0)
ex2 <- c(1,1,1)
ex3 <- c(1,0,1)
ex4 <- c(0,1,0)
ex5 <- c(1,1,1)
ex6 <- c(0,1,1)
ex7 <- c(0,0,1)
ex8 <- c(1,1,0)

df1 <- data.frame(ex1,ex2,ex3, ex4, ex5, ex6, ex7, ex8)
rownames(df1) <- c("a", "b", "c")

df2 <- data.frame(ex2, ex5, ex8, ex6, ex3, ex1, ex4, ex7)
rownames(df2) <- c("a", "b", "c")

Solution

  • df1[,order(
      factor(
        apply(df1, 2, function(x) paste0(x, collapse="")), 
        levels = c("111", "110", "011", "101", "100", "010", "001", "000")
      )
    )]
    ##   ex2 ex5 ex8 ex6 ex3 ex1 ex4 ex7
    ## a   1   1   1   0   1   1   0   0
    ## b   1   1   1   1   0   0   1   0
    ## c   1   1   0   1   1   0   0   1