R - Compute Mismatch By Group

I was wondering how could I compute mismatching cases by group.

Let us imagine that this is my data :

sek = rbind(c(1, 'a', 'a', 'a'), 
        c(1, 'a', 'a', 'a'), 
        c(2, 'b', 'b', 'b'), 
        c(2, 'c', 'b', 'b'))

colnames(sek) <- c('Group', paste('t', 1:3, sep = ''))

The data look like this

     Group t1  t2  t3 
[1,] "1"   "a" "a" "a"
[2,] "1"   "a" "a" "a"
[3,] "2"   "b" "b" "b"
[4,] "2"   "c" "b" "b"

In order to get something like

Group 1 : 0 
Group 2 : 1

It would be fancy to use the stringdist library to compute this.

Something like

seqdistgroupStr = function(x) stringdistmatrix(x, method = 'hamming')

sek %>% 
  as.data.frame() %>% 
  group_by(Group) %>% 
  seqdistgroupStr()

But it is not working.

Any ideas ?

Quick Update: How would you solve the question of weights? For example, how could I pass an argument - a value (1,2,3, ...) - when setting the mistmatch between two characters. Like the mismatch between b and c cost 2 while the mismatch between a and c cost 1 and so on.

Solution

The code below will give you the number of mismatches by group, where a mismatch is defined as one less than the number of unique values in each column t1, t2, etc. for each level of Group. I think you would need to bring in a string distance measure only if you need more than a binary measure of mismatch, but a binary measure suffices for the example you gave. Also, if all you want is the number of distinct rows in each group, then @Alex's solution is more concise.

library(dplyr)
library(reshape2)

sek %>% as.data.frame %>%
  melt(id.var="Group") %>%
  group_by(Group, variable) %>%
  summarise(mismatch = length(unique(value)) - 1) %>%
  group_by(Group) %>%
  summarise(mismatch = sum(mismatch))

  Group mismatch
1     1        0
2     2        1

Here's a shorter dplyr method to count individual mismatches. It doesn't require reshaping, but it requires other data gymnastics:

sek %>% as.data.frame %>%
  group_by(Group) %>%
  summarise_each(funs(length(unique(.)) - 1)) %>%
  mutate(mismatch = rowSums(.[-1])) %>%
  select(-matches("^t[1-3]$"))