I'm trying to visualize some data with a ridge plot, but I'm wondering if there's a way I can weight the densities of the ridges.
Basically I have the following:
set.seed(1)
example <- data.frame(matrix(nrow=100,ncol=3))
colnames(example) <- c("year","position","weight")
example$year <- as.character(rep(c(1,2,3,4,5),each=20) )
example$position <- runif(100,1,10)
example$weight <- sample(1:3,100,replace = T)
A sample of position in 5 different years. I want to plot the distribution change over time with a ridge plot, but in the dataset, there is also a column for "weight," meaning that some samples counted more than others. Is there a way to incorporate this into my ridges distribution plot? And also is there a way to make rows with more sample*weight be taller than rows with less? So not normalize every year's height to one?
ggplot(example,aes(x=position,y=year))+
ggridges::geom_density_ridges()+
theme_classic()
I was thinking I could try to pipe the dataset to repeat rows for number of weight value that they have, and so they would get counted more than x number of times (or, "weight" number of times) and change the density. Can't quite figure out how to do that though. Also, in my dataset, the weights aren't integers, so I'm hoping for a better solution.
Or, is there another package/technique that might achieve that?
For this dataset we can repeat the rows based on weight
column and then plot:
library(ggplot2)
library(ggridges)
example2 <- example[rep(seq_along(example$weight), example$weight), ]
ggplot(example2,aes(x=position,y=year))+
ggridges::geom_density_ridges()+
theme_classic()
#> Picking joint bandwidth of 1.02
However, if you have wights that are not integer, this would not work. There's this open issue on github that you may want to give it a shot.
Another idea would be normalizing your weights in your original dataset to be integer by rounding them to certain digits and multiplying them by 10 to the power of your desired precision. Then you can utilize previous solution for your actual dataset.