Search code examples
google-sheetsgoogle-sheets-formula

PERCENTILE IF using ARRAYFORMULA for a set of conditions


I need to calculate the percentile using an if condition to calculate it by group of conditions, but Google Sheets doesn't provide PERCENTILEIF function. A nonarray solution is possible:

=ARRAYFORMULA(PERCENTILE(if(range=value,values),percentile))

but in my case value should be an array of possible values.

Here is the sample data with the expected result highlighted: sample

I tried several options to use an array of possible values, but in all cases, I get the wrong result:

Using JOIN in G2:

=arrayformula(if(len(E2:E3),percentile(split(regexreplace(join(",",
   Arrayformula(A2:A12 & "_" & B2:B12)),E2:E3  & "_(\d+)|.",",$1"),","),D2),))

Using MATCH in H2:

=ARRAYFORMULA(if(len(E2:E3),
   PERCENTILE(IFNA(--(match(A2:A12,E2:E3,0) > 0) * B2:B12,),D2),))

here is the Spreadsheet file: https://docs.google.com/spreadsheets/d/1VDJIYvmOC46DI_9u4zSEfmxSan5R5VKK772C_kP5rxA/edit?usp=sharing


Solution

  • Just as an exercise I tried working it out from first principles based on the quantiles formula. The Excel or Google Sheets Percentile and Percentile.inc functions use the (N − 1)p + 1 variation shown in the last table under Excel in the reference above.

    So for the first group,

    (N − 1)p + 1 = 3 * 0.8 + 1 = 3.4
    

    This means you interpolate 0.4 of the way from the third point (10) to the fourth point (30), giving you

    10 + 0.4 * (30 - 10) = 18.
    

    The array formula is

    =ArrayFormula(vlookup(vlookup(E2:E3,{sort(A2:B,1,1,2,1),sequence(ROWS(A2:A))},3,false)+floor((countif(A2:A,E2:E3)-1)*D2),{sequence(ROWS(A2:A)),sort(A2:B,1,1,2,1)},3,false)
    +(vlookup(vlookup(E2:E3,{sort(A2:B,1,1,2,1),sequence(ROWS(A2:A))},3,false)+ceiling((countif(A2:A,E2:E3)-1)*D2),{sequence(ROWS(A2:A)),sort(A2:B,1,1,2,1)},3,false)
    -vlookup(vlookup(E2:E3,{sort(A2:B,1,1,2,1),sequence(ROWS(A2:A))},3,false)+floor((countif(A2:A,E2:E3)-1)*D2),{sequence(ROWS(A2:A)),sort(A2:B,1,1,2,1)},3,false))*mod((countif(A2:A,E2:E3)-1)*D2,1))
    

    enter image description here


    I believe you can also do it by manipulating the values of the second argument to the Percentile function - it would go like this:

    =ArrayFormula(percentile(if(A2:A="",,B2:B+A2:A*1000),
    D2*(countif(A2:A,E2:E3)-1)/(count(A2:A)-1)+(countif(A2:A,"<"&E2:E3))/(count(A2:A)-1))-E2:E3*1000)
    

    enter image description here

    Explanation

    I think I can best show the logic by means of a graph:

    enter image description here

    So I've added a constant (50 to make it easier to see on the graph to the second group and 100 to the third group) to separate the three groups. I've also sorted within each group to make it easier to visualise but this isn't necessary in the formula because Percentile will do the sorting.

    If you look at the third group, you can land exactly at the beginning of this group by choosing to go to the 60th percentile in the whole of the data. Then you can go to the 80th percentile of these last five points by adding in the required percentile times the distance between the first and last point in this group as a fraction of the distance between the first and last point in the whole of the data.

    There's nothing magic about choosing 1000 in the formula above, just a big enough number to separate the groups - max(B2:B) would be safest if they are all positive numbers.