Hi I am generating a forest plot by following code. but my visual graph doesnot show the confidence interval on boxes. How can i improve this graphical representation.
mydf <- data.frame(
Variables=c('Variables','Neuroticism_2','Neuroticism_3','Neuroticism_4'),
HazardRatio=c(NA,1.109,1.296,1.363),
HazardLower=c(NA,1.041,1.206,1.274),
HazardUpper=c(NA,1.182,1.393,1.458),
Pvalue=c(NA,"0.001","<0.001","<0.001"),
stringsAsFactors=FALSE
)
#png('temp.png', width=8, height=4, units='in', res=400)
rowseq <- seq(nrow(mydf),1)
par(mai=c(1,0,0,0))
plot(mydf$HazardRatio, rowseq, pch=15,
xlim=c(-10,12), ylim=c(0,7),
xlab='', ylab='', yaxt='n', xaxt='n',
bty='n')
axis(1, seq(0,5,by=.5), cex.axis=.5)
segments(1,-1,1,6.25, lty=3)
segments(mydf$HazardLower, rowseq, mydf$HazardUpper, rowseq)
text(-8,6.5, "Variables", cex=.75, font=2, pos=4)
t1h <- ifelse(!is.na(mydf$Variables), mydf$Variables, '')
text(-8,rowseq, t1h, cex=.75, pos=4, font=3)
text(-1,6.5, "Hazard Ratio (95%)", cex=.75, font=2, pos=4)
t3 <- ifelse(!is.na(mydf$HazardRatio), with(mydf, paste(HazardRatio,' (',HazardLower,'-',HazardUpper,')',sep='')), '')
text(3,rowseq, t3, cex=.75, pos=4)
text(7.5,6.5, "P Value", cex=.75, font=2, pos=4)
t4 <- ifelse(!is.na(mydf$Pvalue), mydf$Pvalue, '')
text(7.5,rowseq, t4, cex=.75, pos=4)
#dev.off()
Edit
I even tried to do this by forestplot package. But i am not getting Confidence interval on grpah as well as i want presentation as above graph.
test_data <- data.frame(coef=c(1.109,1.296,1.363),
low=c(1.041,1.206,1.274),
high=c(1.182,1.393,1.458),
boxsize=c(0.1, 0.1, 0.1))
row_names <- cbind(c("Variable", "N_Quartile 1", "N_Quartile 2", "N_Quartile 3"),
c("HR", test_data$coef), c("CI -95%", test_data$low), c("CI +95%", test_data$high) )
test_data <- rbind(NA, test_data)
forestplot(labeltext = row_names,
mean = test_data$coef, upper = test_data$high,
lower = test_data$low,
clip =c(0.1, 25),
is.summary=c(TRUE, FALSE, FALSE, FALSE),
boxsize = test_data$boxsize,
zero = 1,colgap = unit(3, "mm"), txt_gp=fpTxtGp(label= gpar(cex = 0.7),
title = gpar(cex = 1) ),
xlog = TRUE,
xlab = "HR (95% CI)",
col = fpColors(lines="black", box="black"),
ci.vertices = TRUE,
xticks = c(0.1, 1, 2.5,5,7.5))
Your intervals are quite small, so if you do it manually on plot
it will take a while to refine the correct settings, and putting text together with it is not trivial. Right now your first code is not even 50% there.
My suggestion is to build up the plot slowly using forestplot, and identify the problem, for example if you just plot your data.frame, you see it works, that is the c.i is there, just that it's very narrow, and that's your problem at hand, adjusting the size using lwd.ci
so that it is visible:
forestplot(test_data[,1:3],lwd.ci=3)
Now if we add in the text:
forestplot(
labeltext =row_names,
mean = test_data$coef, upper = test_data$high,
lower = test_data$low,
txt_gp=fpTxtGp(cex=0.8),
is.summary=c(TRUE, FALSE, FALSE, FALSE),
boxsize = test_data$boxsize,lwd.ci=3)
So the text is taking up a bit too much space, i think one way is to use the conventional est[ll - ul] way of representing estimate and confidence interval, you can see examples here. One way I can try below is to wrap the values for the CI into 1 string, and have just two columns for text:
library(stringr)
test_data <- data.frame(coef=c(1.109,1.296,1.363),
low=c(1.041,1.206,1.274),
high=c(1.182,1.393,1.458),
boxsize=c(0.1, 0.1, 0.1))
column1 = c("Variable", "N_Quartile 1", "N_Quartile 2", "N_Quartile 3")
column2 = cbind(c("HR", test_data$coef),
c("CI -95%", test_data$low),
c("CI +95%", test_data$high))
L = max(nchar(column2))
padded_text =apply(column2,1,
function(i)paste(str_pad(i,L),collapse=" "))
test_data <- rbind(NA, test_data)
pdf("test.pdf",width=8,height=4)
forestplot(
labeltext =cbind(column1,padded_text),
mean = test_data$coef, upper = test_data$high,
lower = test_data$low,
txt_gp=fpTxtGp(cex=0.8),align="c",
is.summary=c(TRUE, FALSE, FALSE, FALSE),
boxsize = test_data$boxsize,lwd.ci=3,
graphwidth=unit(100,'mm'))
dev.off()