Consider this funny example
mytib <- tibble(text = c('i can see clearly now',
'the rain is gone'),
myweight = c(1.7, 0.005))
# A tibble: 2 x 2
text myweight
<chr> <dbl>
1 i can see clearly now 1.7
2 the rain is gone 0.005
I know how to create a dfm
weighted by the docvars
myweight
. I proceed as follows:
dftest <- mytib %>%
corpus() %>%
tokens() %>%
dfm()
dftest * mytib$myweight
Document-feature matrix of: 2 documents, 9 features (50.0% sparse).
2 x 9 sparse Matrix of class "dfm"
features
docs i can see clearly now the rain is gone
text1 1.7 1.7 1.7 1.7 1.7 0 0 0 0
text2 0 0 0 0 0 0.005 0.005 0.005 0.005
However the issue is that I cannot use neither topfeatures
nor colSums
.
How can sum the values in every column then?
> dftest*mytib$myweight %>% Matrix::colSums(.)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
Thanks!
Sometimes the %>%
operator harms rather than helps. This works:
colSums(dftest * mytib$myweight)
## i can see clearly now the rain is gone
## 1.700 1.700 1.700 1.700 1.700 0.005 0.005 0.005 0.005
Also consider using dfm_weight(x, weights = ...)
if you have a vector of weights for each feature. The operation above will recycle your weights to make it work the way you want, but you should understand why (in R, because of recycling and because of its column-major order).