Search code examples
rquanteda

Cannot modify docnames of corpus when using get()


I am trying to modify the docnames of a list of corpus objects through a for-loop. Usually, I use the function get() to access a given object while moving through the loop. It seems I cannot do this within the function docnames() of the package quanteda. I always get this error (conditional on your input object which in my case is listofcorpora):

Error in get(listofcorpora[i]) <- `*vtmp*` : 
  could not find function "get<-"

Please, find below a minimal with just two corpora. Originally, I have many more.

library(quanteda)
#> Package version: 2.0.0
#> Parallel computing: 2 of 8 threads used.
#> See https://quanteda.io for tutorials and examples.
#> 
#> Attaching package: 'quanteda'
#> The following object is masked from 'package:utils':
#> 
#>     View
library(stringr)

corp_2015_qtr1 <- corpus( c("The first document of the first corpus.",
                           "The second document of the first corpus" ) )
corp_2015_qtr2 <- corpus( c("The first document of the second corpus.",
                           "The second document of the second corpus" ) )

listofcorpora <- objects( pattern = "corp_\\d+" )

for ( i in seq_along( listofcorpora ) ) {
  current_year <- as.integer( str_extract( listofcorpora[ i ], "\\d+" ) )  current_qtr <- as.integer( str_extract( listofcorpora[ i ], "(?<=QTR)\\d" ) )
  current_docname <- str_c( current_year, 
                           "_qtr_", 
                           current_qtr, "_",
                           formatC( seq_len( ndoc( get( listofcorpora[ i ] ) ) ),
                                    width = 5, flag = "0" ) )
  docnames( get( listofcorpora[ i ] ) ) <- current_docname

}
#> Error in get(listofcorpora[i]) <- `*vtmp*`: could not find function "get<-"

Created on 2020-04-15 by the reprex package (v0.3.0)

The same error is raised whenever I use docvars() in the same fashion.

Thanks!


Solution

  • I don't know where you got the RData files from but generally it makes more sense to save objects via saveRDS and load them with object <- readRDS so you can control the object name or load the file directly into a list.

    In your case I would turn your objects into a list via (as suggested by @phiver in the comments):

    corpora_l <- lapply(listofcorpora, get)
    names(corpora_l) <- listofcorpora
    

    To get a cleaner environment you can now delete the superfluous objects:

    # remove unnecessary objects
    rm(list = c(listofcorpora, "listofcorpora"))
    

    Working with this list seems easier in my opionion and more importantly: docnames() works with list objects:

    for (i in seq_along(corpora_l)) {
      current_name <- names(corpora_l)[i]
      current_year <- as.integer( str_extract( current_name, "\\d+" ) )  
      current_qtr <- as.integer( str_extract( current_name, "(?<=qtr)\\d" ) )
      current_docname <- str_c( current_year, 
                                "_qtr_", 
                                current_qtr, "_",
                                formatC( seq_len( ndoc( corpora_l[[i]] ) ),
                                         width = 5, flag = "0" ) )
      docnames( corpora_l[[i]] ) <- current_docname
    }
    

    Also: I don't know what your plan is regarding the docnames, but it seems the year-qtr is more of a document variable. So you could change the last line in the loop to:

    docvars(corpora_l[[i]], field = "quarter") <- str_c(current_year, 
                                                        "_qtr_", 
                                                        current_qtr)
    

    Sorry I butchered your style there. I'm not used to the amount of spaces you leave in your code.