After conducting a lda topic modeling in R some words have the same beta value. They are therefore listed together when plotting the results. This leads to overlapping and sometimes unreadable results.
Is there a way to limit the amount of words displayed per topic to a specific number? In my dummy data set, some words have the same beta values. I would like to tell R that it should only display 3 words per topic, or any specified number according to necessity.
Currently the code I am using to plot the results looks like this:
top_terms %>% # take the top terms
group_by(topic) %>%
mutate(top_term = term[which.max(beta)]) %>%
mutate(term = reorder(term, beta)) %>%
head(3) %>% # I tried this but that only works for the first topic
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ top_term, scales = "free") +
labs(x = NULL, y = "Beta") + # no x label, change y label
coord_flip() # turn bars sideways
I tried to solve the issue with head(3)
which worked, but only for the first topic.
What I would need is something similar, which doesn't ignore all the other topics.
Best regards. Stay safe, stay healthy.
Note: top_terms
is a tibble.
Sample data:
topic term beta
(int) (chr) (dbl)
1 book 0,9876
1 page 0,9765
1 chapter 0,9654
1 author 0,9654
2 sports 0,8765
2 soccer 0,8654
2 champions 0,8543
2 victory 0,8543
3 music 0,9543
3 song 0,8678
3 artist 0,7231
3 concert 0,7231
4 movie 0,9846
4 cinema 0,9647
4 cast 0,8878
4 story 0,8878
dput
of sample data
top_terms <- structure(list(topic = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L), term = c("book", "page", "chapter",
"author", "sports", "soccer", "champions", "victory", "music",
"song", "artist", "concert", "movie", "cinema", "cast", "story"
), beta = c(0.9876, 0.9765, 0.9654, 0.9654, 0.8765, 0.8654, 0.8543,
0.8543, 0.9543, 0.8678, 0.7231, 0.7231, 0.9846, 0.9647, 0.8878,
0.8878)), row.names = c(NA, -16L), class = "data.frame")
slice_head
after adding an group_by
on grouping field, will do the job here instead of head
top_terms %>% # take the top terms
group_by(topic) %>%
mutate(top_term = term[which.max(beta)]) %>%
mutate(term = reorder(term, beta)) %>%
group_by(top_term) %>%
slice_head(n=3) %>% # I tried this but that only works for the first topic
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ top_term, scales = "free") +
labs(x = NULL, y = "Beta") + # no x label, change y label
coord_flip()