Search code examples
rtopic-modeling

data set ‘NYTimes’ not found


I'm working with topicmodels package:

library(topicmodels)
library(tm)

I tried to load the NYTimes dataset. But:

data(NYTimes)

returns the error:

Warning message:
In data(NYTimes) : data set ‘NYTimes’ not found

I took this code from a textbook on R.


Solution

  • If you do a Google search with the terms"CRAN" data(NYTimes), you should quickly find that the "RTextTools" package has a dataset by that name.

    A bit of further searching yields this information at CRAN:

    Package ‘RTextTools’ was removed from the CRAN repository.
    
    Formerly available versions can be obtained from the archive.
    
    Archived on 2019-03-05 as depends on archived package 'maxent' by the same non-maintainer.
    

    So go to the Package Archive for RTextTools, download it, check to see if it needs to be compiled (it doesn't), and install with the argument repo set to NULL. See ?install.packages for further details. That turns out not to work since attempts to install pkg:maxent fail at the compiling process.

    The other option would be to download, unzip, navigate to the ../data/ directory inside the expanded package directory and then also unzip the compacted file by that name (with a .csv extension).

    Edwards suggestion is also feasible in which case you can go directly to https://github.com/cran/RTextTools/blob/master/data/NYTimes.csv.gz and download and unzip the file without the need to install the package.