Search code examples
rggplot2dplyrhistogramdensity-plot

Density plot in R - Histogram - ggplot


Summarize the problem

I have the following data:

Income Level Percentage
$0 - $1,000 10
$1,000 - $2,000 30
$2,000 - $5,000 60

I want to create an histogram with a density scale. where the total is 100%. The 60% is over a range of 3,000 so I cannot put it a 60%. I also know that the range is [0 to 1000) [2000 to 3000)

I have googled so much and look in my notes and books and I cannot find the answer. I am sure it is an easy one but I am a beginner. Let me share with you the many things I have tried and google and it was the opportunity to learn a few things along the way.

Describe what you’ve tried

I thought I could easy solve this problem with the following code:

# I only added originally ggplot2 and dplyr.  As I was researching on Google, I tried a few solution that needed the string and data.table libraries.
library(ggplot2)
library(dplyr)
library(stringr)
library(data.table)


data <- data_frame(income = c(1000,2000,5000), percentage = c(10,30,40))

data %>% ggplot(aes(x = income)) +
  geom_histogram()

That did not work because it was counting the values so I had 3 bars of height = 1 and the width also was incorrect. It did not go from 0 to 1000, 1000 to 2000 and 2000 to 5000.

After a few searches, I learned about the y=..density... I put it in my aes() info like that.

data %>% ggplot(aes(x = income, y=..density..)) +
  geom_histogram()

The y-axis was not 1 but 0.0025 (I am not really sure why) and I am not still there.

I then discovered weight and I tried:

data %>% ggplot(aes(x = income, y=..density.., weight=percentage)) +
  geom_histogram()

I may be on a better track since for the first time the columns are not the same height.

I then tried binwidth = c(1000, 2000, 5000) because for the graph to have different column width. It did not work. I got an error if I tried to add and start with 0, since there are 4 elements.

I then learned about cut and break and I am not sure I understand it. Anyway, now the code looked like that.

library(ggplot2)
library(dplyr)
library(stringr)
library(data.table)

data <- data.frame(percentage = c(10,30,60))
data$income <- cut(data$percentage, c(0, 1000,2000,5000), right = FALSE)

data %>% ggplot(aes(x = income, y=..density.., weight=percentage)) +
  geom_histogram()

I now have an error about continuous variable and discrete variable.

I found in the book I have that I could also do a pie. I tried the following to check if my data was good.

library(ggplot2)
library(dplyr)
library(stringr)
library(data.table)

data <- data_frame(income = c(1000,2000,5000), percentage = c(10,30,40))

# Create data for the graph.
x <- c(10,30,60)
labels <- c(1000,2000,5000)

pie(x,labels)

The percentage seems OK but I did not create the kind of graph I want. an histogram and not a pie.

I have often found this website very helpful so maybe someone knows the answer. I have just created an account and hope that someone could assist me.

Thanks


Solution

  • Perhaps you're looking for a barchart:

    data %>% ggplot(aes(x = as.factor(income),y = percentage)) +
        geom_bar(stat = "identity") + labs(x = "Income")
    

    enter image description here

    Or as stefan suggests, perhaps something like this with geom_rec:

    data %>% 
      mutate(min = lag(income,1L, 0)) %>%
    ggplot(aes(xmin = min, xmax = income, ymin = 0, ymax = percentage)) +
      geom_rect(color = "black") + labs(x = "Income", y = "Density")
    

    enter image description here