Search code examples
rmatrixfrequency-table

How to find rows with most values filled in a matrix?


Given a matrix (mat1) like this:

mat1 <- matrix(c(1, "", 2, 3, 4, "", 2, 4, "", 5, 2, 1, 4, "", 3, 2, "", 3, "", ""), nrow = 4, ncol = 5)

How would I go about finding say the top 3 rows with the most non-empty string values? For example in mat1, row 1 has 3 values, row 2 has 2 values, row 3 has 4 values, and row 4 has 4 values.

Is there a way where I can perhaps tabulate this in a frequency table of some sort or at least return a vector of the top rows?


Solution

  • if we create a function, we can convert to 'long' format, subset out the blank elements, and get the frequency of the dim attribute for row names

    f1 <- function(mat, n) {
       row.names(mat) <- seq_len(nrow(mat))
       head(sort(table(subset(as.data.frame.table(mat),
            Freq != "")$Var1), decreasing = TRUE), n)
     }
    
    f1(mat1, 3)
    #  3 4 1 
    #  4 4 3 
    

    The output showed is a named vector with names representing the row index or row names and the values as the frequency of non-blanks. The n argument specified by the user gives the top n non-blank rows