Search code examples
rggplot2bar-charthistogramfrequency

Stacked bar plot for two different frequencies


I have a combined frequency with two conditions on a set of readings.

the data frame can be found here:

dput(Merchant_Category_Frequency_with_Target)
structure(list(Var1 = structure(1:31, .Label = c("Airline", "Airports", 
"Alcohol", "Auto", "Books & stationery", "Business Services", 
"Cloth stores", "Contracted services", "Dept stores", "Digital goods", 
"Direct marketing", "Education", "Electronics", "Food", "Fuel", 
"Govt services", "Home furnishing", "Hotels", "Insurance", "Medical", 
"Misc Services", "Music stores", "Professional services & memberships", 
"Quasi cash", "Railways", "Rent Payments", "Restaurants", "Retail", 
"Transportation services", "Utility", "Wallet load"), class = "factor"), 
    Freq.x = c(429L, 1L, 325L, 499L, 239L, 1324L, 5242L, 38L, 
    3881L, 355L, 91L, 1554L, 2200L, 424L, 5588L, 1935L, 264L, 
    1409L, 2384L, 1789L, 971L, 23L, 505L, 5L, 1662L, 4408L, 1820L, 
    3135L, 1297L, 4660L, 1543L), Freq.y = c(16L, NA, 11L, 34L, 
    19L, 56L, 179L, 1L, 141L, 10L, 8L, 100L, 229L, 8L, 142L, 
    40L, 13L, 37L, 142L, 75L, 39L, NA, 18L, NA, 62L, 389L, 33L, 
    148L, 39L, 437L, 194L)), row.names = c(NA, -31L), class = "data.frame")

I want to have a combined frequency distribution table for all the readings (Var1) and for the two frequencies, (Freq.x) should be one colour of the bar and stacked over it, (Freq.y) should be other colour of the bar.

I tried following the various tutorials online but they didn't seem to work, cause the Variable here is a char and not a numeric data.

Cheers


Solution

  • First, you need to transform your data with pivot_longer or gather, I use pivot_longer:

    df <- structure(list(Var1 = structure(1:31, .Label = c("Airline", "Airports", 
    "Alcohol", "Auto", "Books & stationery", "Business Services", 
    "Cloth stores", "Contracted services", "Dept stores", "Digital goods", 
    "Direct marketing", "Education", "Electronics", "Food", "Fuel", 
    "Govt services", "Home furnishing", "Hotels", "Insurance", "Medical", 
    "Misc Services", "Music stores", "Professional services & memberships", 
    "Quasi cash", "Railways", "Rent Payments", "Restaurants", "Retail", 
    "Transportation services", "Utility", "Wallet load"), class = "factor"), 
        Freq.x = c(429L, 1L, 325L, 499L, 239L, 1324L, 5242L, 38L, 
        3881L, 355L, 91L, 1554L, 2200L, 424L, 5588L, 1935L, 264L, 
        1409L, 2384L, 1789L, 971L, 23L, 505L, 5L, 1662L, 4408L, 1820L, 
        3135L, 1297L, 4660L, 1543L), Freq.y = c(16L, NA, 11L, 34L, 
        19L, 56L, 179L, 1L, 141L, 10L, 8L, 100L, 229L, 8L, 142L, 
        40L, 13L, 37L, 142L, 75L, 39L, NA, 18L, NA, 62L, 389L, 33L, 
        148L, 39L, 437L, 194L)), row.names = c(NA, -31L), class = "data.frame")
    
    data <- df |>
      pivot_longer(cols = c(Freq.x, Freq.y), names_to = "freq")
    
    > head(data)
    # A tibble: 6 × 3
      Var1     freq   value
      <fct>    <chr>  <int>
    1 Airline  Freq.x   429
    2 Airline  Freq.y    16
    3 Airports Freq.x     1
    4 Airports Freq.y    NA
    5 Alcohol  Freq.x   325
    6 Alcohol  Freq.y    11
    

    then use ggplot function:

    ggplot(data, aes(x = Var1, y = value, fill = freq)) + 
      geom_bar(stat = "identity") + 
      theme(axis.text.x = element_text(angle = -45, vjust = 0.5, hjust = 0.05))
    

    this is output: ggplot