I am comparing 3 tables of data for what should be the same demands, and want to create a table that shows the MODE from two of the tables (in order to make a suggestion of what the correct value could be). I may need to concatenate as I am looking for the mode of all rows with the same ID and Name.
Table 1:
+----------+----------+------+ | DemandNo | Forename | Size | +----------+----------+------+ | 1 | Richard | 42 | | 2 | Richard | 42 | | 3 | Richard | 42 | | 4 | Richard | 36 | | 5 | Richard | 36 | | 6 | Richard | 36 | | 7 | Richard | 36 | | 8 | Luke | 14 | | 9 | Luke | 14 | | 10 | Luke | 14 | | 11 | Luke | 14 | | 12 | Luke | 14 | | 13 | Luke | 25 | | 14 | Luke | 25 | | 15 | Luke | 25 | +----------+----------+------+
Table 2:
+----------------+-----------------+ | List1_DemandNo | List1_PenColour | +----------------+-----------------+ | 1 | White | | 2 | Black | | 3 | Black | | 4 | Red | | 5 | ? | | 6 | Red | | 7 | Red | | 8 | Yellow | | 9 | Yellow | | 10 | Yellow | | 11 | Green | | 12 | Yellow | | 13 | Green | | 14 | ? | | 15 | ? | +----------------+-----------------+
Table 3:
+----------------+-----------------+ | List2_DemandNo | List2_PenColour | +----------------+-----------------+ | 1 | White | | 2 | Green | | 3 | Green | | 4 | Red | | 5 | ? | | 6 | Red | | 7 | Red | | 8 | Pink | | 9 | Pink | | 10 | Yellow | | 11 | Green | | 12 | Pink | | 13 | Orange | | 14 | Orange | | 15 | Orange | +----------------+-----------------+
So I need to generate a recommendation for each person with the same name and size. The recommendation should be the MODE of all rows in table1 where the person has the same Forename and Size (do I need to concatenate Forename and size?)
The other requirement is that all question marks "?" should be excluded from the MODE/recommendation.
So my results table should look something like this:
+----------+----------+-----+------------+------------+ | DemandNo | Forename | Size | List1_MODE | List2_MODE | +----------+----------+-----+------------+------------+ | 1 | Richard | 42 | Black | Green | | 2 | Richard | 42 | Black | Green | | 3 | Richard | 42 | Black | Green | | 4 | Richard | 36 | Red | Red | | 5 | Richard | 36 | Red | Red | | 6 | Richard | 36 | Red | Red | | 7 | Richard | 36 | Red | Red | | 8 | Luke | 14 | Yellow | Pink | | 9 | Luke | 14 | Yellow | Pink | | 10 | Luke | 14 | Yellow | Pink | | 11 | Luke | 14 | Yellow | Pink | | 12 | Luke | 14 | Yellow | Pink | | 13 | Luke | 25 | Green | Orange | | 14 | Luke | 25 | Green | Orange | | 15 | Luke | 25 | Green | Orange | +----------+----------+-----+------------+------------+
I understand that the MODE function does not work in Teradata, and that I would need to perform a count, but the complexity of calculating the mode using 3 tables and excluding the ? is sadly beyond my SQL skills -Any help would be truly appreciated!
Many thanks Richard
You need to write a separate select for each mode and join to it:
select t1.*, List1_PenColour, List2_PenColour
from table1 as t1
left join
(
select Forename, Size, List1_PenColour
from table1 as t1
join table2 as t2
on t1.DemandNo = t2.List1_DemandNo
and List1_PenColour <> '?'
group by Forename, Size, List1_PenColour
-- return only the row with the highest count
-- random row if multiple rows with the same count exist
qualify row_number()
over (partiton by Forename, Size, List1_PenColour
order by count(*) desc) = 1
) as list1
on t1.Forename = list1.Forename
and t1.Size = list1.Size
left join
(
select Forename, Size, List2_PenColour
from table1 as t1
join table3 as t3
on t1.DemandNo = t3.List2_DemandNo
and List2_PenColour <> '?'
group by Forename, Size, List2_PenColour
qualify row_number()
over (partiton by Forename, Size, List2_PenColour
order by count(*) desc) = 1
) as list2
on t1.Forename = list2.Forename
and t1.Size = list2.Size