I am making sort of a number line graph with the help of ggplot2 and facing the problem of text labels overlapping each other. I have also used geom_text_repel package to avoid text overlapping, but it becomes increasingly messy as more and more factor levels have adjacent mean scores. I have provided below code sample as well as the data used mostly.
Category Dimension1
AcademicWriting -0.7
Brd.Discussions 0.6
Brd.Interviews -2.4
Brd.News 8.3
Brd.Talks 0
BusinessLetters 2.4
ClassLessons 0.2
Commentaries -12.9
Comments -1.2
CreativeWriting 1.4
Documentaries -1.4
F2FConversations -1.8
FBGroups 0.4
FBSt.Updates -1
Ind.Blogs 0.1
Inst.Writing 0.9
NBrd.Talks -0.1
NewsBlogs 0.4
NewsReports 7.1
Pol.Debates -1.4
PopularWriting 0.5
PressEditorials 1.8
SocialLetters 0.6
Speeches 3
StudentWriting -2
TechBlogs 1.7
ThesesPresentations -0.8
Tweets -2.8
And the code:
library(ggplot2)
library(ggrepel)
library(extrafont)
loadfonts(device = "win")
plot_graph <- function(d1, label_below = "", label_above = "")
{
d1 <- d1[order(-d1[,2]),]
d1$X <- rep(0, each=length(d1$Dimension1))
attach(d1)
plot1 <- ggplot(data=d1, aes(x=X, y=Dimension1, label=Category)) +
geom_point() +
geom_text_repel(aes(label=Category), direction = "x", family="Times New Roman", size=4, max.iter = 2e2) +
theme_bw()+
theme(axis.text.x = element_text(colour="black"), axis.text.y = element_text(colour="black"))+
theme(text=element_text(family="Times New Roman"), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
geom_vline(xintercept = 0, linetype = 1) +
coord_cartesian(xlim = c(-3, 3)) +
geom_segment(aes(x = -2, y = 5+min(Dimension1), xend = -2, yend = max(Dimension1)-5), arrow = arrow(ends = "both"), alpha=0.5, size=0.5) +
geom_text(aes(x = -2, y = 6+min(Dimension1), label = label_below)) +
geom_text(aes(x = -2, y = max(Dimension1)-4, label = label_above))
detach(d1)
plot1
}
plot4 <- plot_graph(d1 = d1, label_below = "", label_above = "")
plot4
It results in following graph:
After looking at multiple similar threads, I don't know if there is a solution to solve this problem. But I have an idea to group factor levels, i.e. labels according to their adjacent mean scores, e.g. AcademicWriting, FBSt.Updates (1st and 7th factor levels in the example) could be grouped together after rounding their respective mean scores to -1. And they could be displayed in a horizontal line separated by a comma. But I am unable to think of a way to group them. That's why I am requesting your help, or any other way to solve the overlapping problem.
Here is an idea:
cut the Dimension1 column in as many groups you would like, group by the formed cut variable, paste the Category names and calculate the y coordinate. I mapped the text and the points to the same color but it is not necessary.
library(tidyverse)
d1 %>%
arrange(desc(Dimension1)) %>%
mutate(cut = cut(Dimension1, 32),
X = 0) %>%
group_by(cut) %>%
mutate(label = paste(Category, collapse = ", "),
coord = mean(Dimension1),
label2 = ifelse(duplicated(label), NA, label)) %>%
ungroup() %>%
ggplot(aes(x=X, y=Dimension1, label=Category, color = label)) +
geom_segment(aes(x = -0.25, y = 5 + min(Dimension1), xend = -0.25, yend = max(Dimension1)-5), arrow = arrow(ends = "both"), alpha=0.5, size=0.5)+
geom_point() +
geom_text(aes(label=label2, x = X+0.05, y = coord, color = label), family="Times New Roman", size=4, hjust = 0) +
theme_bw()+
theme(axis.text.x = element_text(colour="black"),
axis.text.y = element_text(colour="black"))+
theme(text=element_text(family="Times New Roman"),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position="none") +
geom_vline(xintercept = 0, linetype = 1) +
coord_cartesian(xlim = c(-0.5, 3))