Search code examples
sqlsql-servergreatest-n-per-group

Count distinct based on order of another column


I have a table containing measurements, it looks like this:

SELECT [Plant] --plant name
       ,[Machine] --machine name
       ,[Material] --material name
       ,[Batch] --batch name
       ,[DMC] --data matrix code, each part has one
       ,[ProductionTime] --time stamp UTC
       ,[Result] --0/1 as NOK/OK
FROM [dbo].[Results]

Each row is one measurement, having a result. Each tested part has a DMC. Since parts can be tested several times, we have duplicated DMCs (with different results) in this table.

Example data:

Plant Machine Material Batch DMC ProductionTime Result
A MachineA MaterialA X ABC 2023-02-16 16:21:52 1
A MachineA MaterialA X DEF 2023-02-16 16:21:30 1
A MachineA MaterialA X DEF 2023-02-16 16:21:09 0
A MachineA MaterialA Y GHI 2023-02-16 16:20:47 1
A MachineB MaterialA X JKL 2023-02-16 16:20:24 0
A MachineB MaterialB Y MNO 2023-02-16 16:20:03 1

To calculate the scrap rate, I need to count the amount of OK and NOK parts. For #OKParts, conditions are: -#OKParts: for each DMC, consider its **latest **production time and calculate how many rows have result = 1 (-#NOKParts same but result = 0)

Additionally for calculating FTT-Yield ("first time through"): -#PartsFirstTestOK: for each DMC, consider its oldest production time and calculate how many rows have result = 1 (-#PartsFirstTestNOK: same but result = 0)

Since this table is very big, I need to reduce the amount of data by filtering the table (see below the where statement) and group the results.

Right now, my query looks like this:

SELECT [Plant]
      ,[Machine]
      ,[Material]
      ,[Batch]
      ,Count([Result]) as '#Tests' --all rows
      ,Count(Distinct [DMC]) as '#Parts' --all different DMCs
      ,COUNT(CASE when [Result] = 1 THEN 1 END) as '#OKTests'
      ,COUNT(CASE when [Result] = 0 THEN 1 END) as '#NOKTests'
  FROM table
  where [Plant] = 'A' and [ProductionTime] > DATEADD(DAY, -365,GETDATE())
  group by [Plant],[Material],[Batch],[Machine]

(Sum(Cast([Result] as INT)) and Count([Result])-Sum(Cast([Result] as INT)) instead of the CASE functions would also work here)

To get (as now) the following result:

Plant Machine Material Batch #Tests #Parts #OKTests #NOKTests
A MachineA MaterialA X 3 3 3 0
A MachineA MaterialA Y 124 96 93 31
A MachineB MaterialA X 11 9 9 2
A MachineB MaterialB Y 21 13 11 10

and I want to add this table by columns #OKParts, #NOKParts, #PartsFirstTestOK, #PartsFirstTestNOK.

I've tried to create a sub-query, FIRST_VALUE & OVER function, but none of them worked for me. I am quite a starter in SQL.

Does anybody have an idea how to solve this? Thanks a lot in advance!


Solution

  • The following uses windowed ROW_NUMBER() functions to assign both ascending and descending sequence numbers to the series of tests for a given part. Checking for one or the other value = 1 can be used to limit your counts to just the first or last test.

    SELECT Plant
          ,Machine
          ,Material
          ,Batch
          ,Count(Result) as [#Tests] --all rows
          ,Count(Distinct DMC) as [#Parts] --all different DMCs
          ,COUNT(CASE WHEN Result = 1 THEN 1 END) as [#OKTests]
          ,COUNT(CASE WHEN Result = 0 THEN 1 END) as [#NOKTests]
          ,COUNT(CASE when RowNumAsc = 1 AND Result = 1 THEN 1 END) as [#PartsFirstTestOK]
          ,COUNT(CASE when RowNumAsc = 1 AND Result = 0 THEN 1 END) as [#PartsFirstTestNOK]
          ,COUNT(CASE when RowNumDesc = 1 AND Result = 1 THEN 1 END) as [#OKParts]
          ,COUNT(CASE when RowNumDesc = 1 AND Result = 0 THEN 1 END) as [#NOKParts]
    FROM (
        SELECT *,
            ROW_NUMBER() OVER(
                PARTITION BY Plant, Material, Batch, Machine, DMC
                ORDER BY ProductionTime
                ) AS RowNumAsc,
            ROW_NUMBER() OVER(
                PARTITION BY Plant, Material, Batch, Machine, DMC
                ORDER BY ProductionTime DESC
                ) AS RowNumDesc
        FROM table1
    ) T
    where Plant = 'A' and ProductionTime > DATEADD(DAY, -365,GETDATE())
    group by Plant, Material, Batch, Machine;
    
    

    Sample results:

    Plant Machine Material Batch #Tests #Parts #OKTests #NOKTests #PartsFirstTestOK #PartsFirstTestNOK #OKParts #NOKParts
    A MachineA MaterialA X 3 2 2 1 1 1 2 0
    A MachineB MaterialA X 1 1 0 1 0 1 0 1
    A MachineA MaterialA Y 1 1 1 0 1 0 1 0
    A MachineB MaterialB Y 1 1 1 0 1 0 1 0

    See this db<>fiddle for a working demo.