I have a combined frequency with two conditions on a set of readings.
the data frame can be found here:
dput(Merchant_Category_Frequency_with_Target)
structure(list(Var1 = structure(1:31, .Label = c("Airline", "Airports",
"Alcohol", "Auto", "Books & stationery", "Business Services",
"Cloth stores", "Contracted services", "Dept stores", "Digital goods",
"Direct marketing", "Education", "Electronics", "Food", "Fuel",
"Govt services", "Home furnishing", "Hotels", "Insurance", "Medical",
"Misc Services", "Music stores", "Professional services & memberships",
"Quasi cash", "Railways", "Rent Payments", "Restaurants", "Retail",
"Transportation services", "Utility", "Wallet load"), class = "factor"),
Freq.x = c(429L, 1L, 325L, 499L, 239L, 1324L, 5242L, 38L,
3881L, 355L, 91L, 1554L, 2200L, 424L, 5588L, 1935L, 264L,
1409L, 2384L, 1789L, 971L, 23L, 505L, 5L, 1662L, 4408L, 1820L,
3135L, 1297L, 4660L, 1543L), Freq.y = c(16L, NA, 11L, 34L,
19L, 56L, 179L, 1L, 141L, 10L, 8L, 100L, 229L, 8L, 142L,
40L, 13L, 37L, 142L, 75L, 39L, NA, 18L, NA, 62L, 389L, 33L,
148L, 39L, 437L, 194L)), row.names = c(NA, -31L), class = "data.frame")
I want to have a combined frequency distribution table for all the readings (Var1) and for the two frequencies, (Freq.x) should be one colour of the bar and stacked over it, (Freq.y) should be other colour of the bar.
I tried following the various tutorials online but they didn't seem to work, cause the Variable here is a char and not a numeric data.
Cheers
First, you need to transform your data with pivot_longer
or gather
, I use pivot_longer
:
df <- structure(list(Var1 = structure(1:31, .Label = c("Airline", "Airports",
"Alcohol", "Auto", "Books & stationery", "Business Services",
"Cloth stores", "Contracted services", "Dept stores", "Digital goods",
"Direct marketing", "Education", "Electronics", "Food", "Fuel",
"Govt services", "Home furnishing", "Hotels", "Insurance", "Medical",
"Misc Services", "Music stores", "Professional services & memberships",
"Quasi cash", "Railways", "Rent Payments", "Restaurants", "Retail",
"Transportation services", "Utility", "Wallet load"), class = "factor"),
Freq.x = c(429L, 1L, 325L, 499L, 239L, 1324L, 5242L, 38L,
3881L, 355L, 91L, 1554L, 2200L, 424L, 5588L, 1935L, 264L,
1409L, 2384L, 1789L, 971L, 23L, 505L, 5L, 1662L, 4408L, 1820L,
3135L, 1297L, 4660L, 1543L), Freq.y = c(16L, NA, 11L, 34L,
19L, 56L, 179L, 1L, 141L, 10L, 8L, 100L, 229L, 8L, 142L,
40L, 13L, 37L, 142L, 75L, 39L, NA, 18L, NA, 62L, 389L, 33L,
148L, 39L, 437L, 194L)), row.names = c(NA, -31L), class = "data.frame")
data <- df |>
pivot_longer(cols = c(Freq.x, Freq.y), names_to = "freq")
> head(data)
# A tibble: 6 × 3
Var1 freq value
<fct> <chr> <int>
1 Airline Freq.x 429
2 Airline Freq.y 16
3 Airports Freq.x 1
4 Airports Freq.y NA
5 Alcohol Freq.x 325
6 Alcohol Freq.y 11
then use ggplot function:
ggplot(data, aes(x = Var1, y = value, fill = freq)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = -45, vjust = 0.5, hjust = 0.05))