Search code examples
ggplot2surveystacked-chart

I need to create a stacked bar chart in R? However R is not reading my code correctly? Any advise please


Sample Data of people who have stated what attributes they look for from their employer. Also the data shows what function each person is from

I have attached an image of my sample data from a survey question! I need to create a stacked bar chart, where on the x axis it shows the attributes people look for from their employer. i.e Flexibility, Recognition, International Opportunities etc.... On the Y axis it has the frequency for each of these attributes. Finally I would like the functions to be grouped within the bars.

I am not sure where to begin because of the way the data is formatted. But I believe I first need to create a table, which counts the responses for each attribute as well as groups them by the functions. I am hoping from the table I can then create a stacked bar chart! Any suggestions or advice will be very helpful please.


Solution

  • Here's an approach that should work for you, but first let me share your dataset in a more user-friendly manner:

    df <- data.frame(
      ID=1:14,
      Flexibility=c(1,0,0,0,0,1,1,0,1,1,0,1,0,1),
      Recognition=c(0,1,1,0,0,0,0,1,0,0,0,1,1,0),
      International_opportunities=c(1,0,0,1,0,0,0,1,0,0,1,0,0,0),
      Autonomy=c(0,0,0,1,0,0,0,0,0,1,1,0,0,0),
      Status=c(0,0,1,0,0,0,0,0,1,0,1,0,0,1),
      Training_Qual=c(1,0,1,0,1,1,1,0,1,1,0,0,1,0),
      Function=c('Technology','Technology','Security','HR','Customer Operations','Technology','Customer Operations','Commercial','Technology','Strategy/Transformation','HR','Technology','HR','Security')
    )
    

    OP: in the future, you can use dput(df) to create a text that can be copied and pasted directly in your question to allow others to recreate your data.

    The first step is to assemble your data in to a more Tidy Data-friendly format. Each column should represent one variable, with each row containing one value/observation for that variable. Looking at your dataset, you can see that the variables for "Attribute" are setup as column names, and "frequency" is spread out over those columns. You can use a variety of techniques to gather the columns together, but I'll show you one using dplyr and tidyr packages from the tidyverse:

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
    df <- df %>%
      gather(key='Attributes',value='freq',-c(ID,Function))
    

    This yields a new dataset, df, that contains the following 4 columns: "ID" (unchanged), "Attributes" (the column names in your original dataset), "freq" (those 1's and 0's), and "Function" (unchanged).

    The Plot

    You can then create a stacked column chart as follows. It wasn't 100% clear from your description the plot you were looking for, but here's one way of showing that data with comments included in the code to help you understand the role of each part in the final output:

    # setup the plot and general aesthetics
    ggplot(df, aes(x=Attributes, y=freq, fill=Function)) +
      
      # the only data geom
      geom_col(position='stack', width=0.8, alpha=0.7) +
      
      # I like these colors, but you can use default if you want
      scale_fill_viridis_d() +
      
      # ensure the bottom of the bars touches the axis
      scale_y_continuous(expand=expansion(mult=c(0,0.05)))+
      
      # theme elements
      theme_bw() +
      theme(
        axis.text.x = element_text(angle=30, hjust=1)
      )
      
    

    enter image description here