I have been using the wfm
function in "qdap" package for transposing the text row values into columns and ran into problem when the data contains numbers along with text. For example if the row value is "abcdef" the transpose works fine but if the value is "ab1000" then the truncation of numbers happen. Can anyone help with suggestions on how to work around this?
Approach tried so far:
input <- read.table(header=F, text="101 ab0003
101 pp6500
102 sm2456")
colnames(input) <- c("id","channel")
require(qdap)
library(qdap)
output <- t(with(input, wfm(channel, id)))
output <- as.data.frame(output)
expected_output<- read.table(header=F,text="1 1 0
0 0 1")
colnames(expected_output) <- c("ab0003","pp6500", "sm2456")
I think maybe wfm
isn't the right tool for this job. It seems you don't really have sentences that you want to split into words. So you're using a function with a lot of overhead unnecessarily. What you really want it to tabulate the values you have by another grouping variable.
Here are two approaches. One using qdapTools
's mtabulate
, another using base R's table
:
library(qdapTools)
mtabulate(with(input, split(channel, id)))
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
t(with(input, table(channel, id)))
## channel
## id ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
It may be possible your MWE is not reflecting the complexity of the data, if this is the case it brings us back to the original problem. wfm
uses tm
package as a backend to make some of the manipulations. So we'd need to supply something to the ldots (...
). I re-read the documentation and this is a bit confusing (I have added this info in the dev version) but we want to pass removeNumbers=FALSE
to TermDocumentMatrix
as seen here:
output <- t(with(input, wfm(channel, id, removeNumbers=FALSE)))
as.data.frame(output)
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1