Search code examples
sqljointeradatamode

MODE in Teradata SQL - excluding a value from the range, and using multiple tables


I am comparing 3 tables of data for what should be the same demands, and want to create a table that shows the MODE from two of the tables (in order to make a suggestion of what the correct value could be). I may need to concatenate as I am looking for the mode of all rows with the same ID and Name.

Table 1:

+----------+----------+------+
| DemandNo | Forename | Size |
+----------+----------+------+
|        1 | Richard  |   42 |
|        2 | Richard  |   42 |
|        3 | Richard  |   42 |
|        4 | Richard  |   36 |
|        5 | Richard  |   36 |
|        6 | Richard  |   36 |
|        7 | Richard  |   36 |
|        8 | Luke     |   14 |
|        9 | Luke     |   14 |
|       10 | Luke     |   14 |
|       11 | Luke     |   14 |
|       12 | Luke     |   14 |
|       13 | Luke     |   25 |
|       14 | Luke     |   25 |
|       15 | Luke     |   25 |
+----------+----------+------+

Table 2:

+----------------+-----------------+
| List1_DemandNo | List1_PenColour |
+----------------+-----------------+
|              1 | White           |
|              2 | Black           |
|              3 | Black           |
|              4 | Red             |
|              5 | ?               |
|              6 | Red             |
|              7 | Red             |
|              8 | Yellow          |
|              9 | Yellow          |
|             10 | Yellow          |
|             11 | Green           |
|             12 | Yellow          |
|             13 | Green           |
|             14 | ?               |
|             15 | ?               |
+----------------+-----------------+

Table 3:

+----------------+-----------------+
| List2_DemandNo | List2_PenColour |
+----------------+-----------------+
|              1 | White           |
|              2 | Green           |
|              3 | Green           |
|              4 | Red             |
|              5 | ?               |
|              6 | Red             |
|              7 | Red             |
|              8 | Pink            |
|              9 | Pink            |
|             10 | Yellow          |
|             11 | Green           |
|             12 | Pink            |
|             13 | Orange          |
|             14 | Orange          |
|             15 | Orange          |
+----------------+-----------------+

So I need to generate a recommendation for each person with the same name and size. The recommendation should be the MODE of all rows in table1 where the person has the same Forename and Size (do I need to concatenate Forename and size?)

The other requirement is that all question marks "?" should be excluded from the MODE/recommendation.

So my results table should look something like this:

+----------+----------+-----+------------+------------+
| DemandNo | Forename | Size | List1_MODE | List2_MODE |
+----------+----------+-----+------------+------------+
|        1 | Richard  |  42 | Black      | Green      |
|        2 | Richard  |  42 | Black      | Green      |
|        3 | Richard  |  42 | Black      | Green      |
|        4 | Richard  |  36 | Red        | Red        |
|        5 | Richard  |  36 | Red        | Red        |
|        6 | Richard  |  36 | Red        | Red        |
|        7 | Richard  |  36 | Red        | Red        |
|        8 | Luke     |  14 | Yellow     | Pink       |
|        9 | Luke     |  14 | Yellow     | Pink       |
|       10 | Luke     |  14 | Yellow     | Pink       |
|       11 | Luke     |  14 | Yellow     | Pink       |
|       12 | Luke     |  14 | Yellow     | Pink       |
|       13 | Luke     |  25 | Green      | Orange     |
|       14 | Luke     |  25 | Green      | Orange     |
|       15 | Luke     |  25 | Green      | Orange     |
+----------+----------+-----+------------+------------+

I understand that the MODE function does not work in Teradata, and that I would need to perform a count, but the complexity of calculating the mode using 3 tables and excluding the ? is sadly beyond my SQL skills -Any help would be truly appreciated!

Many thanks Richard


Solution

  • You need to write a separate select for each mode and join to it:

    select t1.*, List1_PenColour, List2_PenColour 
    from table1 as t1
    left join 
      (
        select Forename, Size, List1_PenColour
        from table1 as t1
        join table2 as t2
        on t1.DemandNo = t2.List1_DemandNo
        and List1_PenColour <> '?'
        group by Forename, Size, List1_PenColour
        -- return only the row with the highest count
        -- random row if multiple rows with the same count exist
        qualify row_number()
                over (partiton by Forename, Size, List1_PenColour 
                      order by count(*) desc) = 1 
      ) as list1
    on t1.Forename = list1.Forename
    and t1.Size = list1.Size
    left join 
      (
        select Forename, Size, List2_PenColour
        from table1 as t1
        join table3 as t3
        on t1.DemandNo = t3.List2_DemandNo
        and List2_PenColour <> '?'
        group by Forename, Size, List2_PenColour
        qualify row_number() 
                over (partiton by Forename, Size, List2_PenColour 
                      order by count(*) desc) = 1 
      ) as list2
    on t1.Forename = list2.Forename
    and t1.Size = list2.Size