corpus_subset
specifies the documents that should be kept, but what about specifying the documents to drop? Assume for example, that I want to drop documents where the term "terorrism" appear, only as long as the term appears before the year 2001.
dfm_terror <- dfm(data_corpus_inaugural, select = "terrorism", valuetype = c("fixed"))
docvars(data_corpus_inaugural, "Terrorism") <- dfm_terror
documents_to_remove <- corpus_subset(data_corpus_inaugural, Terrorism >= 1 & Year < 2001)
corpus_subset
keeps the documents specified in your subset as you correctly describe. So Terrorism >= 1 & Year < 2001 will return the below document.
Year President FirstName Terrorism
1981-Reagan 1981 Reagan Ronald 1
But to get the reverse just negate the subset selection. This will select all the documents except the one listed above.
corpus_subset(data_corpus_inaugural, !(Terrorism >= 1 & Year < 2001))