I have the following data:
Income Level | Percentage |
---|---|
$0 - $1,000 | 10 |
$1,000 - $2,000 | 30 |
$2,000 - $5,000 | 60 |
I want to create an histogram with a density scale. where the total is 100%. The 60% is over a range of 3,000 so I cannot put it a 60%. I also know that the range is [0 to 1000) [2000 to 3000)
I have googled so much and look in my notes and books and I cannot find the answer. I am sure it is an easy one but I am a beginner. Let me share with you the many things I have tried and google and it was the opportunity to learn a few things along the way.
I thought I could easy solve this problem with the following code:
# I only added originally ggplot2 and dplyr. As I was researching on Google, I tried a few solution that needed the string and data.table libraries.
library(ggplot2)
library(dplyr)
library(stringr)
library(data.table)
data <- data_frame(income = c(1000,2000,5000), percentage = c(10,30,40))
data %>% ggplot(aes(x = income)) +
geom_histogram()
That did not work because it was counting the values so I had 3 bars of height = 1 and the width also was incorrect. It did not go from 0 to 1000, 1000 to 2000 and 2000 to 5000.
After a few searches, I learned about the y=..density..
. I put it in my aes() info like that.
data %>% ggplot(aes(x = income, y=..density..)) +
geom_histogram()
The y-axis was not 1 but 0.0025 (I am not really sure why) and I am not still there.
I then discovered weight
and I tried:
data %>% ggplot(aes(x = income, y=..density.., weight=percentage)) +
geom_histogram()
I may be on a better track since for the first time the columns are not the same height.
I then tried binwidth = c(1000, 2000, 5000)
because for the graph to have different column width. It did not work. I got an error if I tried to add and start with 0, since there are 4 elements.
I then learned about cut
and break
and I am not sure I understand it. Anyway, now the code looked like that.
library(ggplot2)
library(dplyr)
library(stringr)
library(data.table)
data <- data.frame(percentage = c(10,30,60))
data$income <- cut(data$percentage, c(0, 1000,2000,5000), right = FALSE)
data %>% ggplot(aes(x = income, y=..density.., weight=percentage)) +
geom_histogram()
I now have an error about continuous variable and discrete variable.
I found in the book I have that I could also do a pie. I tried the following to check if my data was good.
library(ggplot2)
library(dplyr)
library(stringr)
library(data.table)
data <- data_frame(income = c(1000,2000,5000), percentage = c(10,30,40))
# Create data for the graph.
x <- c(10,30,60)
labels <- c(1000,2000,5000)
pie(x,labels)
The percentage seems OK but I did not create the kind of graph I want. an histogram and not a pie.
I have often found this website very helpful so maybe someone knows the answer. I have just created an account and hope that someone could assist me.
Thanks
Perhaps you're looking for a barchart:
data %>% ggplot(aes(x = as.factor(income),y = percentage)) +
geom_bar(stat = "identity") + labs(x = "Income")
Or as stefan suggests, perhaps something like this with geom_rec
:
data %>%
mutate(min = lag(income,1L, 0)) %>%
ggplot(aes(xmin = min, xmax = income, ymin = 0, ymax = percentage)) +
geom_rect(color = "black") + labs(x = "Income", y = "Density")