I'm currently trying to extract latest visited domain information from a table that includes a visits history on my websites and information I want to retreive.
Here is my table structure:
CREATE TABLE `visit_record` (
`id` INT NOT NULL AUTO_INCREMENT,
`date` VARCHAR( 19 ) NOT NULL,
`url` VARCHAR( 1024 ) NOT NULL,
... EXTRA COLUMNS/DATA ...
PRIMARY KEY ( `id` )
);
Here URL stands for multiple pages on same or different domains.
But I want to get all columns for the latest row for each distinct domain, and my URL store a full query, not a domain only (for more flexibility regarding other features).
The problem is, my distinct value to group on is a calculated value. I tried mixing the solution from the post shown in examples I tried, with grouping / left joins / different structure of queries / subqueries, but I cannot manage and as soon as I want to add my other values/columns (for e.g ID of the row, extra data), it show 99% of the time the following error:
SELECT list contains nonaggregated column
(even if I found basic queries on internet looking like my-ones).
As I want to group by domain I'm first extracting a domain name as a column, in a query I'll use later with an alias:
SELECT
`visit_record`.`id`,
`visit_record`.`date`,
SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( `visit_record`.`url`, '/', 3 ), '://', -1 ), '/', 1 ), '?', 1 ) AS `domain`
FROM
`visit_record`;
Here everything is ok, and I see an additional column having my domain name without protocol and query details.
Now I would like to take the latest (MAX) row: the row having the latest date, and get only 1x per unique domain.
I already found part of the solution I guess, thanks to some topics such as the-ones below: Select info from table where row has max date SQL query to get the latest row
But I seems tricky / requiring modifications to fit my use case.
I'm not providing more examples because I made 15+ tries and I don't know which-one is the best to start from, but I guess you have enough information to help me on this!
Data sample of visit_record table:
id date url extra columns
_________________________________________________________________________________
1 2023-06-21 00:00:00 https://example1.com/en ...
2 2023-06-20 23:00:00 https://example1.com/uk ...
3 2023-06-20 14:00:00 https://example2.com/en/about ...
4 2023-06-21 03:00:00 https://example2.com/fr ...
My expected output:
id date url domain extra columns
_________________________________________________________________________________________________
1 2023-06-21 00:00:00 https://example1.com/en example1.com ...
4 2023-06-21 03:00:00 https://example2.com/fr example2.com ...
Thank you in advance for your help.
Problem Reformulation
The main problem was performing this domain extraction from URL and grouping on it, as it is not a primary key, by keeping all columns of the row which is the "latest" (by date) one of each domain.
SOLUTION
Get for each domain which is not a primary key (extracted from an URL value), the latest-row by date (which is not a primary key too), including all the related row columns.
This query will add a last column "domain" and return the result:
SELECT
t1.*
FROM
(
SELECT
*,
SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( `visit_record`.`url`, '/', 3 ), '://', -1 ), '/', 1 ), '?', 1 ) AS `domain`
FROM
`visit_record`
) t1
INNER JOIN
(
SELECT
domain,
max( date ) lastdate
FROM (
SELECT
*,
SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( `visit_record`.`url`, '/', 3 ), '://', -1 ), '/', 1 ), '?', 1 ) AS `domain`
FROM
`visit_record`
) t0
GROUP BY domain
) t2 ON
t1.domain = t2.domain
AND t1.date = t2.lastdate;
Try this
select *
from(
select *,row_number()over(partition by domain order by date desc) rn
from (
SELECT *,
SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX(
SUBSTRING_INDEX( `visit_record`.`url`, '/', 3 )
, '://', -1 ), '/', 1 ), '?', 1 ) AS `domain`
FROM `visit_record`
)t1
)t2
where rn=1;
Result
id | date | url | EXTRA_COLUMNS | domain | rn |
---|---|---|---|---|---|
1 | 2023-06-21 00:00:00 | https://example1.com/en | ... | example1.com | 1 |
4 | 2023-06-21 03:00:00 | https://example2.com/fr | ... | example2.com | 1 |
Updated:
if row_number()
not available in your version of MySQL, use this example:
select *
from (
SELECT *,
SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( `visit_record`.`url`, '/', 3 ), '://', -1 ), '/', 1 ), '?', 1 ) AS `domain`
FROM `visit_record`
)t1
inner join (
select domain,max(date) lastdate
from (
SELECT *,
SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( SUBSTRING_INDEX( `visit_record`.`url`, '/', 3 ), '://', -1 ), '/', 1 ), '?', 1 ) AS `domain`
FROM `visit_record`
) t0
group by domain
)t2 on t1.domain=t2.domain and t1.date=t2.lastdate
;