Search code examples
rtmcorpus

How to reconnect to the PCorpus in the R tm package?


I create a PCorpus, which as far as I understand is stored on HDD, with the following code:

pc = PCorpus(vs, readerControl = list(language = "pl"), dbControl = list(dbName = "pcorpus", dbType = "DB1"))

How may I reconnect to that database later?


Solution

  • You can't as far as I'm aware. The 'database' is actually a filehash object, which you can reconnect to and load as follows,

    db <- dbInit("pcorpus")
    pc<-dbLoad(db)
    

    but it loads each file as it's own object. You need to save to disk explicitly using writeCorpus and reload with a call to PCorpus each time. The PCorpus object just provides a way of creating a Corpus object to disk rather than memory.