Search code examples
sqlsql-servercounting

Count number of records that come after a specific record with different settings


I am looking to create a column that represents how many records come after each record that have different 'settings', or in this case, that don't have the exact same column value 'p' in the test case below:

p orderLevel
Jack .03
Jack / Jill .03.01
Jack / Jill / Bob .03.01.01
------ .03.01.05
------ .03.01.06
Jack / Jill / Robert .03.01.11
------ .03.01.15
Julie / Josh .04.02
Julie / Josh / Tom .04.02.01
Julie / Fred .04.03
------ .04.03.06
------ .04.03.07

With a result table looking something like:

p orderLevel numAfter
Jack .03 11
Jack / Jill .03.01 10
Jack / Jill / Bob .03.01.01 9
------ .03.01.05 4
------ .03.01.06 4
Jack / Jill / Robert .03.01.11 6
------ .03.01.15 3
Julie / Josh .04.02 4
Julie / Josh / Tom .04.02.01 3
Julie / Fred .04.03 2
------ .04.03.06 0
------ .04.03.07 0

where we see that the numAfter column represents how many records there are with a larger 'orderLevel' string and a different 'p' for each record in the table. The reason for doing this is now I can group by 'p' and 'numAfter' to only select one separator record 'p = ------ ' when it is the case that there are multiple following one another. I'm sure there are also other ways to accomplish the goal of only selecting one separator record if there are multiple following each other, so feel free to share those methods as well. But, I am hoping someone can provide the method of creating the 'numAfter' column as I have specified. I don't think it should be too difficult.

See below for the temp table and the results I have so far.

if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (p varchar(150), orderLevel varchar(150))
INSERT into #t1 VALUES 
   ('Jack', '.03'),
   ('Jack / Jill', '.03.01'),
   ('Jack / Jill / Bob', '.03.01.01'),
   ('------', '.03.01.05'),
   ('------', '.03.01.06'),
   ('Jack / Jill / Robert', '.03.01.11'),
   ('------', '.03.01.15'),
   ('Julie / Josh', '.04.02'),
   ('Julie / Josh / Tom', '.04.02.01'),
   ('Julie / Fred', '.04.03'),
   ('------', '.04.03.06'),
   ('------', '.04.03.07');



     select  t1.p,
             t1.orderLevel,
             count(t2.p) as numAfter
        from #t1 t1
        inner join #t1 t2
            on t2.orderLevel > t1.orderLevel
            and t1.p != t2.p
        group by t1.p, t1.orderLevel

Solution

  • You need to use a left join instead of inner join. The left join will join the unmatched rows from the left table with a null values ensuring that all the left table rows are selected.

    select T.p, T.orderLevel, count(D.p)
    from #t1 T left join #t1 D
    on T.p <> D.p and
       T.orderLevel < D.orderLevel
    group by T.p, T.orderLevel 
    order by T.orderLevel
    

    As I understood from your question, you want to create groups for similar consecutive P values, if so, another simple option is to use a difference between two row_numbers as the following:

    select *,
       row_number() over (order by orderLevel) -
       row_number() over (partition by p order by orderLevel) grp
    from #t1
    order by orderLevel
    

    See demo for both queries.

    For more details about this problem you may try to google 'gaps and islands problem in SQL'.