Search code examples
sql-serverwhere-clauseself-joinambiguity

Return a percentage column using self-join with WHERE clause filtering on the numerator only


UPDATE May 31, 2018: Included a SQLFiddle, in the hopes that somebody can figure out a solution.

The following function contains two CTE's named QuirksCTE and SurplusCTE. Both of these CTE's should filter the numerator of a fraction, while the denominator of that fraction should inherit only the filtering effect of QuirksCTE. The object is to produce a percentage (a subset divided by a set). The numerator and denominator of said fraction are related to each other through a self-join. The core of the problem is this: I remain unable to selectively apply 2 CTE filters to the numerator of this SELF-JOIN while only applying 1 of those CTE filters to the denominator. Every attempt so far to rearrange and restructure the where clause and the joins has resulted in the denominator ignoring all filters and instead picking up every row of the data (there are 7 rows in the sample database). Note that in the end I will use the FORMAT function to handle the fraction's transformation into a percentage, but for now I'm ignoring that more cosmetic task.

To use the fiddle linked to below, you should give the function an input parameter from this little table. My manual calculations predict the associated "Correct Output Parameter," which I am so far unable to arrive at using SQL.

Input Parameter       Correct Output Parameter
     0                       0
     1                      .5
     2                      .666
     3                      .666

http://sqlfiddle.com/#!18/ae5a1/1 (Requires a function call and an input variable from chart shown above).

--This is a user defined function containing 2 CTE's and a SELECT 
--statement with 2 joins and a WHERE clause. I included that structural 
--detail in case it makes a difference.


CREATE FUNCTION [dbo].[ExampleOfQueryStructure] 
(
    @MyInputParameter int
)
RETURNS 
@MyOutputParameter TABLE 
(
    [MyPercentage] float          
)
AS
BEGIN
WITH    Quirks_CTE (AnimalDateTime, Quirks) 
        AS   
           (SELECT 
               AnimalDateTime,
               (COALESCE (Lion, 0) + COALESCE (Zebra, 0) 
               + COALESCE (Antelope, 0) + COALESCE (Giraffe, 0)) 
           FROM dbo.tblAnimals),

        Surplus_CTE 
        AS
           (SELECT 
               JobEntryDateTime,
               CASE 
                  WHEN tblJobEntries.Stance = 1
                  THEN ExitLevel5-BeginLevel
                  ELSE BeginLevel-ExitLevel5
               END AS [5_SURPLUS],

               CASE
                  WHEN tblJobEntries.Stance = 1
                  THEN ExitLevel8-BeginLevel
                  ELSE BeginLevel-ExitLevel8
               END AS [8_SURPLUS],

               CASE
                  WHEN tblJobEntries.Stance = 1
                  THEN ExitLevel20-BeginLevel
                  ELSE BeginLevel-ExitLevel20
               END AS [20_SURPLUS],

               CASE
                  WHEN tblJobEntries.Stance = 1
                  THEN ExitLevelStone-BeginLevel
                  ELSE BeginLevel-ExitLevelStone
               END AS [StoneProfit]

        FROM dbo.tblJobEntries)


INSERT INTO @MyOutputParameter ([MyPercentage])
   SELECT 
       COUNT(a.Quirks)/COUNT(b.Quirks)
       FROM QuirksCTE b
       LEFT OUTER JOIN QuirksCTE a 
           ON b.AnimalDateTime = a.AnimalDateTime
              where exists(select 1 from Surplus_CTE c
                           where a.AnimalDateTime = c.JobEntryDateTime
                           AND ([5_SURPLUS] > 0 OR [8_SURPLUS] > 0 
                                OR [20_SURPLUS] > 0 OR [StoneProfit] > 0)   
                           AND a.Quirks <= @MyInputParameter)

RETURN
END


Solution

  • As I mentioned in the comments, your WHERE is making your LEFT JOIN an INNER JOIN (thus rendering both counts the same) since you are forcing the records to match on their date. This is the corrected function.

    CREATE FUNCTION [dbo].[ExampleOfQueryStructure] 
    (
        @MyInputParameter int
    )
    RETURNS 
    @MyOutputParameter TABLE 
    (
        [MyPercentage] float
    )
    AS
    BEGIN
    WITH    Quirks_CTE (AnimalDateTime, Quirks) 
            AS   
               (SELECT 
                   AnimalDateTime,
                   (COALESCE (Lion, 0) + COALESCE (Zebra, 0) 
                   + COALESCE (Antelope, 0) + COALESCE (Giraffe, 0)) 
               FROM dbo.tblAnimals),
    
            Surplus_CTE 
            AS
               (SELECT 
                   JobEntryDateTime,
                   CASE 
                      WHEN tblJobEntries.Stance = 1
                      THEN ExitLevel5-BeginLevel
                      ELSE BeginLevel-ExitLevel5
                   END AS [5_SURPLUS],
    
                   CASE
                      WHEN tblJobEntries.Stance = 1
                      THEN ExitLevel8-BeginLevel
                      ELSE BeginLevel-ExitLevel8
                   END AS [8_SURPLUS],
    
                   CASE
                      WHEN tblJobEntries.Stance = 1
                      THEN ExitLevel20-BeginLevel
                      ELSE BeginLevel-ExitLevel20
                   END AS [20_SURPLUS]
                FROM dbo.tblJobEntries)
    
    INSERT INTO @MyOutputParameter ([MyPercentage])
       SELECT 
           COUNT(a.Quirks) * 1.0 /COUNT(b.Quirks)
           FROM Quirks_CTE b
           LEFT OUTER JOIN Quirks_CTE a 
               ON b.AnimalDateTime = a.AnimalDateTime
                  AND exists(select 1 from Surplus_CTE c
                               where a.AnimalDateTime = c.JobEntryDateTime
                               AND ([5_SURPLUS] > 0 OR [8_SURPLUS] > 0 
                                    OR [20_SURPLUS] > 0)   
                               AND a.Quirks <= @MyInputParameter)
    
    RETURN
    END
    

    I also changed the return value to float (integer would return only 0 or 1).


    EDIT: Use this query to validate your data. It's a lot simpler if you join the surplus with the animal count before on another CTE, as the join is always by date.

    WITH    Quirks_CTE (AnimalDateTime, Quirks) 
            AS   
               (SELECT 
                   AnimalDateTime,
                   (COALESCE (Lion, 0) + COALESCE (Zebra, 0) 
                   + COALESCE (Antelope, 0) + COALESCE (Giraffe, 0)) 
               FROM dbo.tblAnimals),
    
            Surplus_CTE 
            AS
               (SELECT 
                   JobEntryDateTime,
                   CASE 
                      WHEN tblJobEntries.Stance = 1
                      THEN ExitLevel5-BeginLevel
                      ELSE BeginLevel-ExitLevel5
                   END AS [5_SURPLUS],
    
                   CASE
                      WHEN tblJobEntries.Stance = 1
                      THEN ExitLevel8-BeginLevel
                      ELSE BeginLevel-ExitLevel8
                   END AS [8_SURPLUS],
    
                   CASE
                      WHEN tblJobEntries.Stance = 1
                      THEN ExitLevel20-BeginLevel
                      ELSE BeginLevel-ExitLevel20
                   END AS [20_SURPLUS],
    
                  CASE
                      WHEN tblJobEntries.Stance = 1
                      THEN ExitLevelStone-BeginLevel
                      ELSE BeginLevel-ExitLevelStone
                   END AS [StoneSurplus]            
    
                FROM dbo.tblJobEntries),
           AnimalSurplus AS
           (
             SELECT
               Q.AnimalDateTime,
               Q.Quirks,
               Surplus = CASE WHEN [5_SURPLUS] > 0 OR [8_SURPLUS] > 0 OR [20_SURPLUS] > 0 OR [StoneSurplus] > 0 THEN 1 END
             FROM
               Quirks_CTE AS Q
               LEFT JOIN Surplus_CTE AS S ON Q.AnimalDateTime = S.JobEntryDateTime
             )
       -- SELECT * FROM AnimalSurplus
       SELECT 
           COUNT(a.Quirks) * 1.0 /COUNT(b.Quirks)
           FROM AnimalSurplus b
           LEFT OUTER JOIN AnimalSurplus a 
               ON b.AnimalDateTime = a.AnimalDateTime AND a.Surplus = 1
           WHERE b.Quirks <= @MyInputParameter