Search code examples
rdensity-plot

Convert histogram to density graph in R


I have produced the following histogram in the programming language R

#subquestion C
total_2016<-qres2016+qres2_2016
total_2017<-qres2017+qres2_2017
total_2018<-qres2018+qres2_2018
total_2019<-qres2019+qres2_2019
total_2020<-qres2020+qres2_2020

year=c("2016","2017","2018","2019","2020")
contribution=c(total_2016,total_2017,total_2018,total_2019,total_2020)
df = data.frame(year,contribution)

require(scales)
ggplot(df,aes(year,contribution)) + geom_bar(stat="identity",fill=colors()[128]) + ggtitle("Histogram chart showing the total number of articles per year for both diseases")+scale_y_continuous(labels=comma)

I want to produce the graph as a density graph too. However when I repace geom_bar with geom_density I get the following error message

 Groups with fewer than two data points have been dropped. 

And nothing is plotted. What am I doing wrong?

Edit: I should mention that the qres2016 and the other variables are integers. The way I got them is the following:

res2016<-EUtilsSummary("Typhoid meningitis",type="esearch",db="pubmed",datetype="pdat",mindate=2016,maxdate=2016,retmax=500)
res2017<-EUtilsSummary("Typhoid meningitis",type="esearch",db="pubmed",datetype="pdat",mindate=2017,maxdate=2017,retmax=500)
res2018<-EUtilsSummary("Typhoid meningitis",type="esearch",db="pubmed",datetype="pdat",mindate=2018,maxdate=2018,retmax=500)
res2019<-EUtilsSummary("Typhoid meningitis",type="esearch",db="pubmed",datetype="pdat",mindate=2019,maxdate=2019,retmax=500)
res2020<-EUtilsSummary("Typhoid meningitis",type="esearch",db="pubmed",datetype="pdat",mindate=2020,maxdate=2020,retmax=500)

qres2016<-QueryCount(res2016) #counting results
qres2017<-QueryCount(res2017) #counting results
qres2018<-QueryCount(res2018) #counting results
qres2019<-QueryCount(res2019) #counting results
qres2020<-QueryCount(res2020) #counting results

a<- "Total number of articles the last five years for M434: "
qrestotal<-qres2016+qres2017+qres2018+qres2019+qres2020
print(paste(a,qrestotal))

#searching pubmed for second disease

    res2_2016<-EUtilsSummary("Sequelae of rickets",type="esearch",db="pubmed",datetype="pdat",mindate=2016,maxdate=2016,retmax=500)
    res2_2017<-EUtilsSummary("Sequelae of rickets",type="esearch",db="pubmed",datetype="pdat",mindate=2017,maxdate=2017,retmax=500)
    res2_2018<-EUtilsSummary("Sequelae of rickets",type="esearch",db="pubmed",datetype="pdat",mindate=2018,maxdate=2018,retmax=500)
    res2_2019<-EUtilsSummary("Sequelae of rickets",type="esearch",db="pubmed",datetype="pdat",mindate=2019,maxdate=2019,retmax=500)
    res2_2020<-EUtilsSummary("Sequelae of rickets",type="esearch",db="pubmed",datetype="pdat",mindate=2020,maxdate=2020,retmax=500)

    qres2_2016<-QueryCount(res2_2016) #counting results
    qres2_2017<-QueryCount(res2_2017) #counting results
    qres2_2018<-QueryCount(res2_2018) #counting results
    qres2_2019<-QueryCount(res2_2019) #counting results
    qres2_2020<-QueryCount(res2_2020) #counting results

What I need to do is plot the number of articles per year in a density graph, however the error I mentioned above comes up

The plot I am trying to produce looks something like the following:

enter image description here


Solution

  • A density plot is a way of showing the density of discrete events on the x axis as a smoothed value on the y axis. You have annual counts, which doesn't lend itself to a density plot. Probably the nearest equivalent is to have a smoothed area plot. However, to do this fairly, you will have to annualize your 2020 data, otherwise it will not be an accurate reflection of publication rate.

    I think this is about as close as you're going to get:

    df$year <- as.numeric(as.character(df$year))
    df$annualized_counts <- df$contribution
    df$annualized_counts[5] <- df$contribution[5] * 366/lubridate::yday(lubridate::now())
    
    plot_df <- data.frame(year = seq(2016, 2020, length.out = 100), 
                          counts = spline(df$annualized_counts, n = 100)$y)
    
    ggplot(plot_df, aes(year, counts)) + 
     geom_area(colour = "forestgreen", fill = "forestgreen", alpha = 0.4) + 
     geom_point(data = df, aes(x = year, y = annualized_counts)) +
     labs(title = "Annualized citation count for typhoid meningitis and rickets",
          y = "Annualized citations")
    

    enter image description here