I would like to understand the practical differences of following cases:
fcm(objectname # generate feature co-occurrence matrix
to calculate the absolute frequenies. Finally plot with function textplot_network()
.I don't know how to plot the correlated word pairs with package quanteda.
My idea is (maybe is not an efficient way) to compute
textstat_collocations()
and transform it to a tibble object and plot it with the functions of the widyr package.
My open questions are:
How can I split column collocation into two separate columns like item1 item2 and
add select column lambda and save it and assign to a tibble object?
> head(sotu_collocations,1)
collocation count count_nested length lambda z
1 smart city 229 0 2 9.846542 51.78172
Like this? Remove the select()
command if you prefer to keep all of the columns.
library("quanteda")
## Package version: 2.1.2
colls <- textstat_collocations(data_corpus_inaugural[1:5], size = 2)
head(colls)
## collocation count count_nested length lambda z
## 1 of the 98 0 2 1.494207 11.89704
## 2 has been 9 0 2 5.691667 11.61596
## 3 i have 15 0 2 3.754144 11.51091
## 4 may be 14 0 2 4.072366 11.43632
## 5 have been 10 0 2 4.679873 10.94315
## 6 we have 9 0 2 4.458284 10.35023
as.data.frame(colls) %>%
tidyr::separate("collocation", into = c("word1", "word2"), sep = " ") %>%
dplyr::select(word1, word2, lambda) %>%
tibble::tibble()
## # A tibble: 678 x 3
## word1 word2 lambda
## <chr> <chr> <dbl>
## 1 of the 1.49
## 2 has been 5.69
## 3 i have 3.75
## 4 may be 4.07
## 5 have been 4.68
## 6 we have 4.46
## 7 foreign nations 6.32
## 8 it is 3.50
## 9 my country 4.49
## 10 united states 7.22
## # … with 668 more rows