Search code examples
tf-idftext2vec

How can I create a tf-idf matrix with character n-gram features?


How can I use the text2vec package to create a tdf-idf matrix with character n-gram features?


Solution

  • How about:

    library(text2vec)
    data("movie_review")
    it = itoken(movie_review$review, tolower, char_tokenizer)
    v = create_vocabulary(it, ngram = c(3, 3), sep_ngram = "_")
    dtm = create_dtm(it, vectorizer = vocab_vectorizer(v))
    

    PS in future please try to provide some reproducible example of what did you try to solve your problem.