I have the following example dataset and as you can see the column X is the class ID, column Y is each class name, and finally column Z is the other universities class ID used for each subject. The goal is to get X, and Z with a grouping by X and get the most common ID used by Z (Other universities).
Sheet 1:
X | Y | Z |
---|---|---|
123 | 22 | 122 |
123 | 22 | 123 |
123 | 22 | 122 |
256 | 21 | 256 |
256 | 21 | 255 |
341 | 33 | 400 |
341 | 33 | 400 |
The outcome should be:
X | Z |
---|---|
123 | 122 |
256 | 255 |
341 | 400 |
I tried by adding the following query, but it is only returning the maximum from all the the table but not only for each value in column Z.
SELECT Sheet1.X, Sheet1.Z
FROM Sheet1
GROUP BY Sheet1.X, Sheet1.Z
HAVING COUNT(Sheet1.Z) =
(SELECT MAX(sheet2.count) FROM
(SELECT Sheet1.X, Sheet1.Z, COUNT(Sheet1.Z) AS count
FROM Sheet1
GROUP BY Sheet1.X, Sheet1.Z) as sheet2);
Any suggestion of what I am doing wrong?
This is a real pain in MS Access. But if you want one row per X
, then you can do:
SELECT s1.X, s1.Z
FROM Sheet1 as s1
GROUP BY s1.X, s1.Z
HAVING s1.Z = (SELECT TOP (1) s2.Z
FROM Sheet1 as s2
WHERE s2.X = s1.X
GROUP BY s2.Z
ORDER BY COUNT(*) DESC, s2.Z
);
If you want multiple rows in the event of ties, then use:
SELECT s1.X, s1.Z
FROM Sheet1 as s1
GROUP BY s1.X, s1.Z
HAVING COUNT(*) IN (SELECT TOP (1) COUNT(*)
FROM Sheet1 as s2
WHERE s2.X = s1.X
GROUP BY s2.Z
);
I should note that this would be simpler in just about any other database.