Search code examples
rtext-miningquanteda

Create custom dictionary from character vector


I am trying to look for specific words in corpus using dfm_lookup().

I am really struggling with the dictionaries needed for the dfm_loopup().

I created a character vector named "words" which contains all the words that should go into the dictionary.

dictionary needs a list, so I am creating a list from the character vector before I am using dictionary().

dict <- dictionary(list(words))

But then I get

Error in validate_dictionary(object) : 


 Dictionary elements must be named: digital digital-tv digitalis ...

What do I have to change in the list command to get the proper output for dictionary()?

Is there a simplier version to look for specific words in a dfm? Because it was really easy with the tm() package.


Solution

  • I believe you need to name the items in a list in order to use dictionary with quanteda. Here is an example:

    library(quanteda)
    
    words = c("cat","dog","bird")
    
    word.list = as.list(words)
    names(word.list) = words
    
    dictionary(word.list)
    Dictionary object with 3 key entries.
    - [cat]:
      - cat
    - [dog]:
      - dog
    - [bird]:
      - bird