Search code examples
runiquerows

Get Count for each unique value in each Row in R


Given the following dataframe (read in from a csv file that I have no control over the contents)

          C1 C2 C3 C4 C5 C6 C7 C8 C9
        1  a  a  a  c  d  b  c  c  a
        2  a  a  b  b  c  b  b  a  b
        3  a  c  c  d  d  b  c  b  b
        4  c  a  d  c  d  d  c  d  d

I need to get a count of each unique value in each row. If a single value appears 5 or more times, then that row should be determined to be invalid (possibly by adding an additional valid/invalid column). I do not care which value appears 5 or more times, just that any value does. In the above, rows 2 & 4 would be invalid. Unfortunately, I have no idea how many rows of data will be in the csv file.


Solution

  • If you just want to know whether the row contains more than four of any entry you can do:

    df$invalid <- apply(df, 1, function(x) max(table(x)) > 4)
    #>   C1 C2 C3 C4 C5 C6 C7 C8 C9 invalid
    #> 1  a  a  a  c  d  b  c  c  a   FALSE
    #> 2  a  a  b  b  c  b  b  a  b    TRUE
    #> 3  a  c  c  d  d  b  c  b  b   FALSE
    #> 4  c  a  d  c  d  d  c  d  d    TRUE
    

    If you actually want the counts for each row too you could then do:

    cbind(df, t(apply(df[-10], 1,function(x) table(factor(x, levels = letters[1:4])))))
    #>   C1 C2 C3 C4 C5 C6 C7 C8 C9 invalid a b c d
    #> 1  a  a  a  c  d  b  c  c  a   FALSE 4 1 3 1
    #> 2  a  a  b  b  c  b  b  a  b    TRUE 3 5 1 0
    #> 3  a  c  c  d  d  b  c  b  b   FALSE 1 3 3 2
    #> 4  c  a  d  c  d  d  c  d  d    TRUE 1 0 3 5