Search code examples
rnlpsentiment-analysisstemminglemmatization

Removing words from lemmatisation dictionary/updating lemma dictionary in textstem


I am using the textstem package to lemmatise words in some responses. However there is one word (spotting) which I do not wan't to be included, and reduced to "spot". I want it to remain as spotting. How might I be able to do this? Do I need to make a custom dictionary? Currently doing:

lemmatize_strings(df, dictionary = lexicon::hash_lemmas)

Solution

  • You can create your own dictionary where you remove the token spotting

    # hash_lemmas is a datatable, so you can use column name token instead hash_lemmas$token
    my_lex <- lexicon::hash_lemmas[!token == "spotting", ]
    
    df_lemmatized <- lemmatize_strings(df, dictionary = my_lex)
    

    Or if you want to do it without creating your own lexicon:

    df_lemmatized <- lemmatize_strings(df, dictionary = lexicon::hash_lemmas[!token == "spotting", ])