Search code examples
rplotquantile

How do I create an array formatted string values from quantiles and their values?


Suppose I am working with some code in R like this:

library(data.table)
dt <- data.table(x=c(1:200),y=rnorm(200))
probs <- c(0.1, 0.25, 0.5, 0.75, 0.9)
quantiles <- quantile(dt$y, prob=probs)

I would like to produce a new variable (an array or a sequence) called labels that contains formatted strings of the quantiles and their respective values. Let's say quantiles prints out this:

> quantiles
       10%        25%        50%        75%        90% 
-1.2097339 -0.6195308 -0.0155171  0.7417443  1.2982685

How would I go about programmatically producing labels from the value quantiles such that when I print out labels it emits an array of sequence like this:

> labels
[1] "10% at -1.20" "25% at -0.61" "50% at -0.01" "75% at 0.74" "90% at 1.29"

So how would you go about wiring all of this together to produce labels? Given that we have probs, we could probably simplify this process by doing this zipping with probs and quantiles's values.

My goal is to use labels to label an density function's x axis with package ggplot where I want to elegantly label both the quantiles and their values together (think about something like this).

Zipping the Data Together

I've seen that I can inspect the quantiles programmatically with the builtin function names:

> names(quantiles)
[1] "10%" "25%" "50%" "75%" "90%"

I've also seen that I can extract the quantile's values programmatically with as.vector:

> as.vector(quantiles)
[1] -1.2097339 -0.6195308 -0.0155171  0.7417443  1.2982685

But I've seen no way of zipping these two things together à la Python.

String Formatting

Then want I decimal precision on the respective quantile values in the formatting, which requires something akin to using sprintf("%.2f", ...) on each value.

Each formatted value in the sequence would probably be produced with sprintf("%s at %.2f", q, v).


I've used R on and off for two decades, but I've never been able to deeply retain skills in it. The main problem I am facing is with plumbing and ergonomic wiring together of these two pieces of data. Through other research, I found something similar to paste0(names(quantiles), '=', unlist(quantiles), collapse=' at '), but this doesn't produce the right result:

> paste0(names(quantiles), '=', unlist(quantiles), collapse=' at ')
[1] "10%=-1.20973393089285 at 25%=-0.619530792386393 at 50%=-0.0155171014275248 at 75%=0.741744347748158 at 90%=1.29826846939529"

It produces a singular string (instead of a sequence) and the precision of the quantile values is too high.


Solution

  • Using sprintf for everything.

    > sprintf('%s at %.2f', names(qntls), qntls)
    [1] "10% at -1.30" "25% at -0.61" "50% at -0.02" "75% at 0.63"  "90% at 1.29" 
    

    For the plot you could do sth like this:

    > par(mar=c(4, 4, 1, 1)+.1)
    > plot(dens <- density(dt$y), xaxt='n', main='')
    > cm <- matrixStats::colMins(dif <- abs(mapply(`-`, list(dens$x), qntls)))
    > points(qntls, dens$y[apply(t(t(dif) == cm), 2, which.max)], type='h')
    > mtext(sprintf('%s\n(%.2f)', names(qntls), qntls), 1, 1, at=qntls, cex=.8)
    

    enter image description here


    Data:

    > library(data.table)
    > set.seed(42)
    > dt <- data.table(x=1:200, y=rnorm(200))
    > qntls <- quantile(dt$y, prob=c(0.1, 0.25, 0.5, 0.75, 0.9))