Search code examples
rtransposeqdap

transpose row to column in R using qdap


I have been using the wfm function in "qdap" package for transposing the text row values into columns and ran into problem when the data contains numbers along with text. For example if the row value is "abcdef" the transpose works fine but if the value is "ab1000" then the truncation of numbers happen. Can anyone help with suggestions on how to work around this?

Approach tried so far:

input <- read.table(header=F, text="101 ab0003 
             101 pp6500 
             102 sm2456")
colnames(input) <- c("id","channel")

require(qdap)
library(qdap)
output <- t(with(input, wfm(channel, id)))
output <- as.data.frame(output)

expected_output<- read.table(header=F,text="1 1 0
                          0 0 1")

colnames(expected_output) <- c("ab0003","pp6500", "sm2456")

Solution

  • I think maybe wfm isn't the right tool for this job. It seems you don't really have sentences that you want to split into words. So you're using a function with a lot of overhead unnecessarily. What you really want it to tabulate the values you have by another grouping variable.

    Here are two approaches. One using qdapTools's mtabulate, another using base R's table:

    library(qdapTools)
    mtabulate(with(input, split(channel, id)))
    
    ##     ab0003 pp6500 sm2456
    ## 101      1      1      0
    ## 102      0      0      1
    
    t(with(input, table(channel, id)))
    
    ##      channel
    ## id    ab0003 pp6500 sm2456
    ##   101      1      1      0
    ##   102      0      0      1
    

    It may be possible your MWE is not reflecting the complexity of the data, if this is the case it brings us back to the original problem. wfm uses tmpackage as a backend to make some of the manipulations. So we'd need to supply something to the ldots (...). I re-read the documentation and this is a bit confusing (I have added this info in the dev version) but we want to pass removeNumbers=FALSE to TermDocumentMatrix as seen here:

    output <- t(with(input, wfm(channel, id, removeNumbers=FALSE)))
    as.data.frame(output)
    
    ##     ab0003 pp6500 sm2456
    ## 101      1      1      0
    ## 102      0      0      1