Search code examples
rdataframecontingency

From contingency tables to data.frame in R


My starting point is having several character vectors containing POS tags I extracted from texts. For example:

c("NNS", "VBP", "JJ",  "CC",  "DT")
c("NNS", "PRP", "JJ",  "RB",  "VB")

I use table() or ftable() to count the occurences of each tag.

 CC  DT  JJ NNS VBP 
 1   1   1   1   1

The ultimate goal is to have a data.frame looking like this:

   NNS VBP PRP JJ CC RB DT VB
1  1   1   0   1  1  0  1  0
2  1   0   1   1  0  1  0  1 

Using plyr::rbind.fill seems reasonable to me here, but it needs data.frame objects as inputs. However, when using as.data.frame.matrix(table(POS_vector)) an error occurs.

Error in seq_len(ncols) : 
argument must be coercible to non-negative integer

Using as.data.frame.matrix(ftable(POS_vector)) actually produces a data.frame, but without the colnames.

V1 V2 V3 V4 V5 ...
1  1  1  1  1

Any help is highly appreciated.


Solution

  • In base R, you can try:

    table(rev(stack(setNames(dat, seq_along(dat)))))
    

    You can also use mtabulate from "qdapTools":

    library(qdapTools)
    mtabulate(dat)
    #   CC DT JJ NNS PRP RB VB VBP
    # 1  1  1  1   1   0  0  0   1
    # 2  0  0  1   1   1  1  1   0
    

    dat is the same as defined in @Heroka's answer:

    dat <- list(c("NNS", "VBP", "JJ",  "CC",  "DT"),
                c("NNS", "PRP", "JJ",  "RB",  "VB"))