Here is an example ggplot that I would like to build. In my data I have a problem that I have lots of values in small stretch of the histogram. Thus, I would like to make the x axis disproportionately stretched (here between the values of 80,81,82,83,84,85
). So, the tickmarks would be spaced evenly on the graph, and the space between the tickmarks would not be proportionate to the incremental increase in the values on that graph. Consequently, I would also like to apply a different bin size to that part of the histogram (let's say binwidth = 1
).
library(ggplot2)
set.seed(42)
data <- data.frame(c(rnorm(mean=80,sd=20,30)),seq(1,30,1),
c("A","B","B","A","A","B","B","A","A","A",
"A","B","B","A","A","B","B","A","A","B",
"B","A","A","B","B","A","A","B","B","A"))
colnames(data) <- c("vals","respondent","category")
# Plot the number of vals
ggplot(data,aes(x = vals,fill = category)) +
geom_histogram(position = "stack",binwidth = 5) +
ggtitle("plot")+
#scale_x_continuous(c(40,50,60,70,80,81,82,83,84,85,95,105,115))+
theme_minimal() +
ylab("Number of respondents")+xlab("Number of vals")
You can calculate the size (width / height) yourself, as a series of stacked rectangles.
Using the diamonds dataset for illustration, suppose this is our original histogram, and we want to zoom in for the [500, 1000] price range:
ggplot(diamonds,
aes(x = price, fill = color)) +
geom_histogram(binwidth = 500) +
theme_bw()
Define your preferred axis breaks:
x.axis.breaks <- c(0, # binwidth = 500
seq(500, 900, 100), # binwidth = 100
seq(1000, 19000, 500)) # binwidth = 500
> x.axis.breaks
[1] 0 500 600 700 800 900 1000 1500 2000 2500 3000 3500 4000 4500
[15] 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000 10500 11000 11500
[29] 12000 12500 13000 13500 14000 14500 15000 15500 16000 16500 17000 17500 18000 18500
[43] 19000
Calculate xmin / xmax / ymin / ymax for each interval:
library(dplyr)
diamonds2 <- diamonds %>%
mutate(price.cut = cut(price,
breaks = x.axis.breaks)) %>%
count(price.cut, color) %>%
mutate(xmin = x.axis.breaks[as.integer(price.cut)],
xmax = x.axis.breaks[as.integer(price.cut) + 1]) %>%
group_by(price.cut) %>%
arrange(desc(color)) %>%
mutate(ymax = cumsum(n)) %>%
mutate(ymin = lag(ymax)) %>%
mutate(ymin = ifelse(is.na(ymin), 0, ymin)) %>%
ungroup()
> diamonds2
# A tibble: 294 x 7
price.cut color n xmin xmax ymax ymin
<fct> <ord> <int> <dbl> <dbl> <int> <dbl>
1 0 J 158 0 500 158 0
2 500 J 80 500 600 80 0
3 600 J 84 600 700 84 0
4 700 J 51 700 800 51 0
5 800 J 43 800 900 43 0
6 900 J 47 900 1000 47 0
7 1000 J 145 1000 1500 145 0
8 1500 J 198 1500 2000 198 0
9 2000 J 163 2000 2500 163 0
10 2500 J 72 2500 3000 72 0
# ... with 284 more rows
Plot:
p <- ggplot(diamonds2,
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = color)) +
geom_rect() +
theme_bw()
p
I'm not inclined to "stretch" part of a continuous axis, as it distorts interpretation. But you can zoom in using facet_zoom
from the ggforce package:
library(ggforce)
p + facet_zoom(x = xmin >= 500 & xmax <= 1000)
If you don't want the neighbouring bars to be visible in the zoomed facet, set the x-axis range expansion parameters as 0.
p +
facet_zoom(x = xmin >= 500 & xmax <= 1000) +
scale_x_continuous(expand = c(0, 0))
Edit
To have a different binwidth at the end with customised label, you can make the following changes:
# use even binwidth (500) up to 15000, then jump to the end
x.axis.breaks <- c(0, # binwidth = 500
seq(500, 900, 100), # binwidth = 100
seq(1000, 15000, 500), # binwidth = 500
19000) # everything else
# reduce the largest xmax value in order to have the same bar width
diamonds2 <- diamonds2 %>%
mutate(xmax = ifelse(xmax == max(xmax),
xmin + 500,
xmax))
# define breaks & labels for x-axis
p <- p +
scale_x_continuous(breaks = seq(0, 15000, 5000),
labels = c(seq(0, 10000, 5000),
"15000+"))