I am using quanteda
and want to conditionally assign docvars()
.
Consider the following MWE:
library(dplyr)
library(quanteda)
library(quanteda.corpora)
testcorp <- corpus(data_corpus_movies))
I now want to assign a dummy docvar neg_sent_lg_id2
, which should be 1
for all documents where the Sentiment
is neg
and where the id2
is > 10000
.
Importantly, I don't want to subset the corpus but I want to assign the docvar to a subset of the corpus and then retain the entire corpus.
I have used docvars(testcorp, field = "neg_sent_lg_id2") <- 0
to assign 0 to the docvars and would now like to do something like this - the following lines are pseudo r
code and do not work but convey the idea.
corpus_subset(testcorp, Sentiment == "neg") %>% # filter on "Sentiment"
corpus_subset(testcorp, id2 > 10000) %>% # filter on "id2"
docvars(testcorp, field = "neg_sent_lg_id2") <- 1 # selectively assign docvar
You can use ifelse
for this:
library(dplyr)
library(quanteda)
library(quanteda.corpora)
testcorp <- corpus(data_corpus_movies)
docvars(testcorp, field = "neg_sent_lg_id2") <-
ifelse(docvars(testcorp, field = "Sentiment") == "neg" & docvars(testcorp, field = "id2") > 10000,
1, 0)
It's not a pretty syntax but it works:
head(docvars(testcorp))
#> Sentiment id1 id2 neg_sent_lg_id2
#> neg_cv000_29416 neg cv000 29416 1
#> neg_cv001_19502 neg cv001 19502 1
#> neg_cv002_17424 neg cv002 17424 1
#> neg_cv003_12683 neg cv003 12683 1
#> neg_cv004_12641 neg cv004 12641 1
#> neg_cv005_29357 neg cv005 29357 1
table(docvars(testcorp, field = "neg_sent_lg_id2"))
#>
#> 0 1
#> 1005 995
Created on 2019-10-15 by the reprex package (v0.3.0)