Search code examples
rdictionaryhunspell

what is the method to add vocabulary in hunspell pakckage


I have a list of word that I want to correct using unspell but in these words, there could be some specific word that hunspell didn't know and that he has to not correct(the list is not defined and too long to be added by hand)

what method do I could use to solve that?

I already tried to find and upgrade the dictionary

here is a list of the word :

    keywords<-c("Millimeter",            "OMT",                   "Chooz",                
   "DCTPC", "JEM"                   "EUSO"                 
    "EUSO", "EUSO"                  "PDM"                  
   "FPGA",  "Chooz"                 "Cepheids"             
   "Circumstellar","Tokamak"               "ASIC"                 
   "TiSAFT", "CoRoT"                 "Unes"                 
   "Radioastronomy" ,"Coronagraphy",          "Fiber",                
  "Ultrastable" ,"Puslsar"               "Magnetohydrodynamic",  
   "KSZ", "Gaussianity",           "Raman",                
   "Gravimetry", "Casimir"               "transfert"            
   "TES", "MEMS",                  "CMB",                  
   "CMB" ,"TES"                   "Blazar"               
   "modeling","DFB"                   "linewidth"            
   "Asteroseismology","ExPRES",                "NDA",                  
   "rephasing", "Nulling",               "Gyroscop",             
   "Atmopsheric","fibers",                "Spectroscopie",        
   "d'absorption","Calculs",               "Aluminum",             
  "Transneptunian","Planetology",           "Ultrastable",          

so are really bad spelling like transfert or d'absorption but other are special words or anagrams here is the code :

bad_matrix<-sapply(keywords,FUN = function(x){hunspell(x,dict=dict_lang)})
bad_index=sapply(1:dim(bad_matrix)[1],FUN =function(x){length(bad_matrix[[x]])!=0})

Solution

  • Use dictionary() with add_words parameter -

    library("hunspell")
    keywords<-c("Millimeter", "OMT","Chooz")
    words <- c("OMT", "wiskey")
    correct_pkg <- hunspell_check(words)
    correct_custom <- hunspell_check(words, dict = dictionary("en_US", add_words=keywords))
    correct_pkg
    correct_custom
    

    Output

    > correct_pkg
    [1] FALSE FALSE
    
    > correct_custom
    [1]  TRUE FALSE
    

    Notice how in the second case "OMT" gets accepted as a word.