Running following script will produce the results
a <- c("Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven't found it yet, keep looking. Don't settle. As with all matters of the heart, you'll know when you find it. - Steve Jobs")
a_source <- VectorSource(a)
a_corpus <- VCorpus(a_source)
term_stats(a_corpus)
term_stats(a_corpus)
term count support
1 . 5 1
2 to 5 1
3 is 4 1
4 you 4 1
5 , 3 1
Support is the number of documents where the word occurs, count is the number of occurrences. You need both if doing tf-idf.
library(tm)
txt <- c("Your work is going to fill a large part of your life,
and the only way to be truly satisfied is to do what you
believe is great work.
And the only way to do great work is to love what you do.
If you haven't found it yet, keep looking. Don't settle.
As with all matters of the heart, you'll know when you find it.
- Steve Jobs")
term_stats(VCorpus(VectorSource(txt)))[1:5,]
term count support
. 5 1
to 5 1
is 4 1
#Split txt into 4 docs
txt_df <- data.frame( txt = c(
"Your work is going to fill a large part of your life,
and the only way to be truly satisfied is to do what you
believe is great work." ,
"And the only way to do great work is to love what you do." ,
"If you haven't found it yet, keep looking. Don't settle." ,
"As with all matters of the heart, you'll know when you find it. -
Steve Jobs"))
term_stats(VCorpus(VectorSource(txt_df$txt)))[1:6,]
term count support
. 5 4
you 4 4
, 3 3
the 3 3
to 5 2
is 4 2
Default is to sort by support.