Given the following dataframe (read in from a csv file that I have no control over the contents)
C1 C2 C3 C4 C5 C6 C7 C8 C9
1 a a a c d b c c a
2 a a b b c b b a b
3 a c c d d b c b b
4 c a d c d d c d d
I need to get a count of each unique value in each row. If a single value appears 5 or more times, then that row should be determined to be invalid (possibly by adding an additional valid/invalid column). I do not care which value appears 5 or more times, just that any value does. In the above, rows 2 & 4 would be invalid. Unfortunately, I have no idea how many rows of data will be in the csv file.
If you just want to know whether the row contains more than four of any entry you can do:
df$invalid <- apply(df, 1, function(x) max(table(x)) > 4)
#> C1 C2 C3 C4 C5 C6 C7 C8 C9 invalid
#> 1 a a a c d b c c a FALSE
#> 2 a a b b c b b a b TRUE
#> 3 a c c d d b c b b FALSE
#> 4 c a d c d d c d d TRUE
If you actually want the counts for each row too you could then do:
cbind(df, t(apply(df[-10], 1,function(x) table(factor(x, levels = letters[1:4])))))
#> C1 C2 C3 C4 C5 C6 C7 C8 C9 invalid a b c d
#> 1 a a a c d b c c a FALSE 4 1 3 1
#> 2 a a b b c b b a b TRUE 3 5 1 0
#> 3 a c c d d b c b b FALSE 1 3 3 2
#> 4 c a d c d d c d d TRUE 1 0 3 5