Search code examples
rplotbar-chartavailability

How to plot the availability of a variable by year?


year <- c(2000:2014)
group <- c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
         "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
         "C","C","C","C","C","C","C","C","C","C","C","C","C","C","C")
value <- sample(1:5, 45, replace=TRUE)

df <- data.frame(year,group,value)
df$value[df$value==1] <- NA

   year group value
1  2000     A    NA
2  2001     A     2
3  2002     A     2
...
11 2010     A     2
12 2011     A     3
13 2012     A     5
14 2013     A    NA
15 2014     A     3
16 2000     B     2
17 2001     B     3
...
26 2010     B    NA
27 2011     B     5
28 2012     B     4
29 2013     B     3
30 2014     B     5
31 2000     C     5
32 2001     C     4
33 2002     C     3
34 2003     C     4
...
44 2013     C     5
45 2014     C     3

Above is the sample dataframe for my question. Each group (A,B or C), has value from 2000 to 2014, but in some years, the value might be missing for some of the groups.

The graph I would like to plot is as below:

x-axis is year

y-axis is group (i.e. A, B & C should be showed on y-lab)

the bar or line represent the value availability of each group

If the value is NA, then the bar would not show at that time point. ggplot2 is preferred if possible.

Can anyone help? Thank you.

I think my description is confusing. I am expecting a graph like below, BUT the x-axis would be year. And the bar or line represents the availability of the value for a given group across the year.

In the sample dataframe of group A, we have

2012 A 5
2013 A NA
2014 A 3

Then there should be nothing at the point of group A in 2013, and then a dot would be presented at the point of group A in 2014.

enter image description here


Solution

  • You can use the geom_errorbar, with no range (geom_errorbarh for horizontal). Then just subset for complete.cases (or !is.na(df$value))

    library(ggplot2)
    
    set.seed(10)
    
    year <- c(2000:2014)
    group <- c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
           "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
           "C","C","C","C","C","C","C","C","C","C","C","C","C","C","C")
    value <- sample(1:5, 45, replace=TRUE)
    
    df <- data.frame(year,group,value)
    df$value[df$value==1] <- NA
    
    no_na_df <- df[complete.cases(df), ]
    
    ggplot(no_na_df, aes(x=year, y = group)) + 
        geom_errorbarh(aes(xmax = year, xmin = year), size = 2)
    

    enter image description here

    Edit: To get a countious bar, you can use this slightly unappealing method. It is nesessary to make a numeric representation of the group data, to give the bars a width. Thereafter, we can make the scale represent the variables as discrete again.

    df$group_n <- as.numeric(df$group)
    
    no_na_df <- df[complete.cases(df), ]
    
    ggplot(no_na_df, aes(xmin=year-0.5, xmax=year+0.5, y = group_n)) + 
        geom_rect(aes(ymin = group_n-0.1, ymax = group_n+0.1)) +
        scale_y_discrete(limits = levels(df$group))
    

    enter image description here