Search code examples
rshinytext-mining

checking if word exist in english dictionary r


I'm performing some text analysis on mutliple resume to generate a wordcloud using wordcloud package along with tm package for preprocessing the corpus of document in R.

The problems i'm facing are :

  1. Checking whether the word in corpus have some meaning ie. it belongs to english dictionary.

  2. How to mine / process multiple resumes together.

  3. Checking for tech terms like r,java,eclipse etc.

Appreciate the help.


Solution

  • I've faced some issues before, so sharing solutions to your problems :

    1. There is a package qdapDictionaries which is a collection of dictionaries and word lists for use with the 'qdap' package.

    library(qdapDictionaries)
    
    #create custom function
    is.word  <- function(x) x %in% GradyAugmented # or use any dataset from package
    
    #use this function to filter words, df = dataframe from corpus
    df <- df[which(is.word(df$terms)),]
    

    2. Using VCorpus(DirSource(...)) to create your corpus from directory containing all resumes

    resumeDir <- "path/all_resumes/"
    myCorpus <- VCorpus(DirSource(resumeDir))
    

    3. Create your custom dictionary file like my_dict.csv containing tech terms.

    #read custom dictionary
    tech_dict <- read.csv("path/to/my_dict.csv", stringsAsFactors = FALSE)
    #create tech function
    is.tech <- function(x) x %in% tech_dict
    #filter
    tech_df <- df[which(is.tech(df$terms)),]
    

    Hope this helps.