Search code examples
sqlsql-serverperformancet-sqlsql-server-2017

Create flag column after comparing two tables


I have two different report tables with datetimes and the report owner. I would like to select the people who have written either reports at least once. I also need a calculated field that shows which report number they wrote. Report 1 takes precedence, so if at any time someone authored report one, the new report_number column should say 1, else 2 (for report 2).

'people' table
| person_id | full_name
--------------------------
| 1         | John L Smith
| 2         | Carl M Selt
| 3         | Another Person

'report_1' table
| report_1_id | author_person_id | date_entered | other_columns
---------------------------------------------------------------
| 1           | 1                | 2018-01-12   | foo
| 2           | 1                | 2018-02-18   | foo foo

'report_2' table
| report_2_id | author_person_id | date_entered | other_columns
---------------------------------------------------------------
| 1           | 1                | 2018-03-21   | bar
| 2           | 1                | 2018-03-28   | bar bar
| 3           | 2                | 2018-04-16   | baz
| 4           | 2                | 2018-04-30   | baz baz

Desired results:

| full_name    | report_number
---------------------------
| John L Smith | 1
| Carl M Smelt | 2

Note that John's report_number is 1 even though he also authored a report 2.

Report 1 and report 2 have different additional columns even though they look the same above.

What I tried:

    /* Get people from both reports */
WITH report_1_people AS (
    SELECT P.full_name
    FROM report_1 R1
    INNER JOIN people P ON R1.author_person_id = P.person_id
    WHERE P.full_name IS NOT NULL 
    AND P.full_name <> ''
), report_2_people AS (
    SELECT P2.full_name
    FROM report_2 R2
    INNER JOIN people P2 ON R2.author_person_id = P2.person_id
    WHERE P2.full_name IS NOT NULL 
    AND P2.full_name <> ''
)
SELECT 
    P.full_name,
    CASE WHEN P.full_name IN ( /* Check if in report 1 */
                    SELECT full_name
                    FROM report_1)
                    THEN 1
            ELSE 2
            END AS report_number
FROM people P
WHERE P.full_name IS NOT NULL AND P.full_name <> ''
/* Eliminate duplicate names */
GROUP BY P.full_name 
/* Filter only who either authored report 1 or report 2 */
HAVING P.full_name IN (SELECT full_name
                       FROM report_1_people)
OR P.full_name IN (SELECT full_name
                   FROM report_2_people)

Note: There is a GROUP BY with the people table because for some reason there are duplicates entries.

Query took so long it disconnected from the database (24+ hours), so I think I'm doing something wrong. Is there a better way to accomplish this flag calculated column based on two tables? Relatively new to SQL so I'm wondering if there is another way of thinking I am over looking with SQL logic.


Solution

  • The INs on the CTEs are likely to kill it dead.

    An alternative way is to use EXISTS to check if a person has written a report. A CASE expression can handle the precedence.

    SELECT p.full_name,
           CASE
             WHEN EXISTS (SELECT *
                                 FROM report_1 r1
                                 WHERE r1.author_person_id = p.person_id) THEN
               1
             WHEN EXISTS (SELECT *
                                 FROM report_2 r2
                                 WHERE r2.author_person_id = p.person_id) THEN
               2
           END report_number
           FROM people p
           WHERE EXISTS (SELECT *
                                FROM report_1 r1
                                WHERE r1.author_person_id = p.person_id)
                  OR EXISTS (SELECT *
                                    FROM report_2 r2
                                    WHERE r2.author_person_id = p.person_id);
    

    For performance try indexes on report_1 (author_person_id) and report_2 (author_person_id). For people you might experiment with an index on person_id(which likely already exists) or maybe a compound one on person_id and full_name.