Search code examples
rdictionaryquanteda

R see list of words in dictionary


I want to see the words included in a dictionary. Here is my dictionary:

Name               Type                             Value
dict_lg            list [2] (quanteda::dictionary2) List of length 2
   NEGATIVE        character [2867]                 'à côrnes' 'à court de personnel'
   POSITIVE        list [1] (quanteda::dictionary2) List of length 1 
      VÉRITÉ* (1)) character [0]

I would like to see the words included in each list (NEGATIVE, POSITIVE). If I do:

dict_lg <- dictionary(file = "frlsd/frlsd.cat", encoding = "UTF-8")
dict_lg$NEGATIVE

it prints me the list of negative words, but if I do:

dict_lg$POSITIVE

I obtain:

Dictionary object with 1 key entry.
- [VÉRITÉ* (1))]:

or if I do

dict_lg[["POSITIVE"]][["VÉRITÉ* (1))"]]

I obtain

character(0)

How can I see the list of positive words? The original dictionary file is this one: https://www.poltext.org/fr/donnees-et-analyses/lexicoder


Solution

  • You can examine the list structure of the dictionary like so:

    rapply(dict_lg, f = \(i) i, how = 'list') |> str()
    

    ... which suggests that the structure was messed up (either at generation of the cat-file or upon import):

    List of 2
     $ NEGATIVE:List of 1
      ..$ : chr [1:2867] "à côrnes" "à court de personnel " "à l'étroit" "à peine*" ...
     $ POSITIVE:List of 2
      ..$ VÉRITÉ* (1)):List of 1
      .. ..$ : chr(0) 
      ..$             : chr [1:1283] "à l'épreuve*" "à la mode" "abondamment" "abondance" ...
    

    ... however you can pull all terms from list item 'POSITIVE' like this:

    rapply(dict_lg, f = \(i) i, how = 'list')$POSITIVE
    

    edit to convert the dictionary into a dataframe of terms and sentiments to, e. g. filter out the terms of negative sentiment:

    library(dplyr)
    
    rapply(dict_lg, f = \(i) i, how = 'unlist', ) %>%
    data.frame(term = .,
               sentiment = gsub('(POSITIVE|NEGATIVE).*', '\\1', names(.))
               ) %>%
    filter(sentiment == 'NEGATIVE')
    
                               term sentiment
    NEGATIVE1              à côrnes  NEGATIVE
    NEGATIVE2 à court de personnel   NEGATIVE
    NEGATIVE3            à l'étroit  NEGATIVE
    NEGATIVE4              à peine*  NEGATIVE
    NEGATIVE5                abais*  NEGATIVE
    NEGATIVE6              abandon*  NEGATIVE
    ## truncated