Search code examples
rvectorpercentage

R find the percentage of the way through an ordered vector that each value changes


I am looking for a way to take an ordered vector and return the percentage of the way through the vector that each value appears for the first time.

See below for the input vector and the expected result.

InputVector<-c(1,1,1,1,1,2,2,2,3,3)

ExpectedResult<-data.frame(Value=c(1,2,3), Percentile=c(0,0.5,0.8))

In this case, 1 appears at the 0th percentile, 2 at the 50th and 3 at the 80th.


Solution

  • Using rank() and unique():

    data.frame(
        Value = InputVector,
        Percentile = (rank(InputVector, ties.method = "min") - 1) / length(InputVector)
      ) |>
      unique()
    
    #>   Value Percentile
    #> 1     1        0.0
    #> 6     2        0.5
    #> 9     4        0.8
    

    You could also use dplyr::percent_rank(), but note it computes percentiles differently:

    library(dplyr)
    
    tibble(
        Value = InputVector,
        Percentile = percent_rank(Value)
      ) %>% 
      distinct()
    
    #> # A tibble: 3 × 2
    #>   Value Percentile
    #>   <dbl>      <dbl>
    #> 1     1      0    
    #> 2     2      0.556
    #> 3     4      0.889
    

    Created on 2022-11-09 with reprex v2.0.2