I'm trying to create some tables with the following characteristics:
My schema is as follows:
CREATE TABLE WebSitesStatus(
-- Host name of the site in question.
Host String(40) NOT NULL,
-- When the information on the site was last updated.
UpdateDate Date NOT NULL,
HttpsWorks BOOL NOT NULL,
ModernTls BOOL NOT NULL,
) PRIMARY KEY(Host, UpdateDate DESC);
So an example subset of the table might be:
+--------------+------------+------------+-----------+
| Host | UpdateDate | HttpsWorks | ModernTls |
+--------------+------------+------------+-----------+
| foobar.com | 2016-10-14 | true | false |
| google.com | 2016-10-14 | true | true |
| google.com | 2016-04-23 | false | false |
| nytimes.com | 2016-10-14 | false | false |
| nytimes.com | 2016-05-25 | true | true |
| nytimes.com | 2016-04-25 | true | false |
| nytimes.com | 2016-04-23 | true | false |
| paypal.com | 2016-10-14 | true | false |
+--------------+------------+------------+-----------+
I'd like to find a way to get back
+--------------+------------+------------+-----------+
| Host | UpdateDate | HttpsWorks | ModernTls |
+--------------+------------+------------+-----------+
| foobar.com | 2016-10-14 | true | false |
| google.com | 2016-10-14 | true | true |
| nytimes.com | 2016-10-14 | false | false |
| paypal.com | 2016-10-14 | true | false |
+--------------+------------+------------+-----------+
but the structure isn't allowing me to easily do this. Is there a way to do this query easily?
This query:
SELECT t.Host, MAX(t.UpdateDate) AS UpdateDate
FROM WebSitesStatus AS t
GROUP BY t.Host
returns just the primary keys of the rows you want:
+--------------+------------+
| Host | UpdateDate |
+--------------+------------+
| foobar.com | 2016-10-14 |
| google.com | 2016-10-14 |
| nytimes.com | 2016-10-14 |
| paypal.com | 2016-10-14 |
+--------------+------------+
To get the other columns you must JOIN
this back to the same table, like this:
SELECT p.Host, p.UpdateDate, q.HttpsWorks, q.ModernTls,
FROM (SELECT t.Host, MAX(t.UpdateDate) AS UpdateDate
FROM WebSitesStatus AS t
GROUP BY t.Host) AS p
JOIN WebSitesStatus AS q
ON p.Host = q.Host AND p.UpdateDate = q.UpdateDate;
That yields this data, which is what you wanted:
+--------------+------------+------------+-----------+
| Host | UpdateDate | HttpsWorks | ModernTls
+--------------+------------+------------+-----------+
| amazon.co.uk | 2016-10-14 | true | false |
| amazon.com | 2016-10-14 | true | true |
| nytimes.com | 2016-10-14 | false | false |
| paypal.com | 2016-10-14 | true | false |
+--------------+------------+------------+-----------+
This is somewhat inefficient, because it scans your table twice: once to get the primary key values and then again to fetch all the other values. But it's probably the cleanest approach.