Package development: How can I import data from a package, transform it, and rexport as a data set?

Using the roxygen2 framework how can I import a data set from another package, perform an alteration, and reexport the data set as a dataset within my own package?

In my experience with exporting data sets one does this process manually by saving the .rda file (usually with the save function). I'd like to make this more dynamic so if the other package updates the data set when people update the dependency package my package will update its data set accordingly.

So for example let's say I want to import the stop_words data set from tidytext, remove the SMART type lexicon and reexport as stop_words2. Is there a way to do this? I'll know this solution works when data(package = 'MyPackage') would reveal the re-exported data set.

My attempt that does not work ( data(package = does not work even though the data is accessible):

#' Various lexicons for English stop words
#'
#' English stop words from three lexicons, as a data frame.
#' The onix sets are pulled from the tm package. Note
#' that words with non-ASCII characters have been removed.  THis
#' is a reimport from the \pkg{tidytext} package's \code{stop_words}
#' data set but with the SMART lexicon filtered out.
#'
#' @format A data frame with 578 rows and 2 variables:
#' \describe{
#'  \item{word}{An English word}
#'  \item{lexicon}{The source of the stop word. Either "onix" or "snowball"}
#'  }
#' @usage data(sam_i_am2)
#' @export
stop_words2 <- tidytext::stop_words[tidytext::stop_words[['lexicon']] != 'SMART', ]

Solution

I don't think this is possible, because data() searches only in a subdirectory data/ that's not where a re-export puts a data object.

But if you give up this objective, then you can still access the new data object as if it were a "lazy loaded" dataset. But just to be clear this will not work using data(stop_words2, package = "MyPackage").

#' Various lexicons for English stop words
#'
#' English stop words from three lexicons, as a data frame. The onix sets are
#' pulled from the tm package. Note that words with non-ASCII characters have
#' been removed.  This is a reimport from the \pkg{tidytext} package's
#' \code{stop_words} data set but with the SMART lexicon filtered out.
#' @inherit tidytext::stop_words title description source references
#' @export
stop_words2 <- tidytext::stop_words[tidytext::stop_words[["lexicon"]] != "SMART", ]

Note the roxygen2 use of recycling the original documentation components.

Consider using the stopwords package, which has the SMART words and much more.