Search code examples
rquanteda

conditionally assign docvar()


I am using quanteda and want to conditionally assign docvars().

Consider the following MWE:

library(dplyr)
library(quanteda)
library(quanteda.corpora)

testcorp <- corpus(data_corpus_movies))

I now want to assign a dummy docvar neg_sent_lg_id2, which should be 1 for all documents where the Sentiment is neg and where the id2 is > 10000.

Importantly, I don't want to subset the corpus but I want to assign the docvar to a subset of the corpus and then retain the entire corpus.

I have used docvars(testcorp, field = "neg_sent_lg_id2") <- 0 to assign 0 to the docvars and would now like to do something like this - the following lines are pseudo r code and do not work but convey the idea.

corpus_subset(testcorp, Sentiment == "neg") %>% # filter on "Sentiment"
    corpus_subset(testcorp, id2 > 10000) %>% # filter on "id2"
    docvars(testcorp, field = "neg_sent_lg_id2") <- 1 # selectively assign docvar

Solution

  • You can use ifelse for this:

    library(dplyr)
    library(quanteda)
    library(quanteda.corpora)
    
    testcorp <- corpus(data_corpus_movies)
    
    docvars(testcorp, field = "neg_sent_lg_id2") <- 
      ifelse(docvars(testcorp, field = "Sentiment") == "neg" & docvars(testcorp, field = "id2") > 10000,
             1, 0)
    

    It's not a pretty syntax but it works:

    head(docvars(testcorp))
    #>                 Sentiment   id1   id2 neg_sent_lg_id2
    #> neg_cv000_29416       neg cv000 29416               1
    #> neg_cv001_19502       neg cv001 19502               1
    #> neg_cv002_17424       neg cv002 17424               1
    #> neg_cv003_12683       neg cv003 12683               1
    #> neg_cv004_12641       neg cv004 12641               1
    #> neg_cv005_29357       neg cv005 29357               1
    table(docvars(testcorp, field = "neg_sent_lg_id2"))
    #> 
    #>    0    1 
    #> 1005  995
    

    Created on 2019-10-15 by the reprex package (v0.3.0)