Search code examples
google-bigqueryranking

How to rank a BigQuery table based on the upper & lower bounds of a value's margin of error in each row


I've got a BigQuery table of percentages & the associated margin of error on each of those percentages. Is there a way in BigQuery Standard SQL to use or replicate the RANK() function to determine the numbering of that table for the range of values the margin of error represents? For example, if I have:

WITH `test_data` AS (
  SELECT 'A' Name, 91.4 Percentage, 0.9 Percentage_Error UNION ALL
  SELECT 'B', 90.5, 0.5 UNION ALL
  SELECT 'C', 89.9, 0.7 UNION ALL
  SELECT 'D', 88.8, 0.3
)
SELECT
  Name,
  Percentage,
  Percentage_Error,
  Percentage + Percentage_Error AS UpperBound,
  Percentage - Percentage_Error AS LowerBound,
  /* 1 AS Uncalculated_Rank */
FROM
  `test_data`

I'd like the ranking to:

  • consider anything with an UpperBound lower then a LowerBound as peer rows to receive the same rank value
  • make the next rank value incremented by the number of peers with the previous rank value

So in this example, my results would be:

Entry Percentage Percentage_Error UpperBound LowerBound Rank
A 91.4 0.9 92.3 90.5 1
B 90.5 0.5 91.0 90.0 1
C 89.9 0.7 90.6 89.2 1
D 88.8 0.3 89.1 88.5 4

If it helps, I had previously implemented a form of this ranking in Javascript; the logic was such for an array of objects values where ranking was initially set to 1 for all entries & it was sorted in descending order by percentage:

for (let i = 0; i < values.length; i++) {
  for (let j = i + 1; j < values.length; j++) {
    if (values[i].lowerBound > values[j].upperBound) {
      values[j].ranking += 1;
    }
  }
                
  if (i != 0 && values[i - 1].ranking != values[i].ranking && values[i].ranking < i + 1) {
    values[i].ranking = i + 1;
  }
}

Thanks in advance!


Solution

  • To be honest, your example is a bit vague, I'd like to see more cases to understand it better. As far as I understand, you want to cross join the table to itself, but only for the rows to the below.

    The query below gives the output you want, but I'm not sure if it's what you want.

    WITH 
    test_data AS 
    (
      SELECT 'A' Name, 91.4 Percentage, 0.9 Percentage_Error UNION ALL
      SELECT 'B', 90.5, 0.5 UNION ALL
      SELECT 'C', 89.9, 0.7 UNION ALL
      SELECT 'D', 88.8, 0.3
    ),
    bounds as 
    (
      SELECT
        Name,
        Percentage,
        Percentage_Error,
        Percentage + Percentage_Error AS UpperBound,
        Percentage - Percentage_Error AS LowerBound,
        row_number() over (order by Name) as rank1
      FROM test_data
    )
    select 
      b1.Name,
      any_value(b1.Percentage) as Percentage,
      any_value(b1.Percentage_Error) as Percentage_Error,
      any_value(b1.UpperBound) as UpperBound,
      any_value(b1.LowerBound) as LowerBound,
      countif(b1.UpperBound < b2.LowerBound) + 1 as Rank
    from bounds b1
    left join bounds b2
      on b1.rank1 > b2.rank1
    group by 1
    order by 1