Search code examples
mysqlquery-optimizationgreatest-n-per-group

Quicker way to get this data out of mySQL


I currently am doing this to get some data from our table:

SELECT DISTINCT(CategoryID),Distance FROM glinks_DistancesForTowns WHERE LinkID = $linkID ORDER BY Distance LIMIT 20

I'm iterating over that for every link id we have (50k odd). Them I'm processing them in Perl with:

my @cats;
while (my ($catid,$distance) = $sth->fetchrow) {
    push @cats, $cat;
}

I'm trying to see if there is a better way to do this in a sub-query with MySQL, vs doing 50k smaller queries (i.e one per link)

The basic structure of the table is:

glinks_Links
    ID
glinks_DistancesForTowns 
    LinkID
    CategoryID
    Distance

I'm sure there must be a simple way to do it - but I'm just not seeing it.

As requested - here is a dump of the table structure. Its actually more complex than that, but the other fields just hold values so I've taken those bits out to give a cleaner over-view of the structure:

CREATE TABLE `glinks_DistancesForTowns` (
 `LinkID` int(11) DEFAULT NULL,
 `CategoryID` int(11) DEFAULT NULL,
 `Distance` float DEFAULT NULL,
 `isPaid` int(11) DEFAULT NULL,
 KEY `LinkID` (`LinkID`),
 KEY `CategoryID` (`CategoryID`,`isPaid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1


CREATE TABLE `glinks_Links` (
 `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `Title` varchar(100) NOT NULL DEFAULT '',
 `URL` varchar(255) NOT NULL DEFAULT 'http://',
 PRIMARY KEY (`ID`),
 KEY `booking_hotel_id_fk` (`booking_hotel_id_fk`)
) ENGINE=MyISAM AUTO_INCREMENT=617547 DEFAULT CHARSET=latin1

This is the kind of thing I'm hoping for:

SELECT glinks_Links.ID FROM glinks_Links as links, glinks_DistancesForTowns as distance (
        SELECT DISTINCT(CategoryID),Distance FROM distance WHERE distance.LinkID = links.ID ORDER BY Distance LIMIT 20
    )

But obviously that doesn't work;)


Solution

  • It sounds like you want the top 20 towns by distance for each link, right?

    MySQL 8.0 supports window functions, and this would be the way to write the query:

    WITH cte AS (
      SELECT l.ID, ROW_NUMBER() OVER(PARTITION BY l.ID ORDER BY d.Distance) AS rownum
      FROM glinks_Links as l
      JOIN glinks_DistancesForTowns AS d ON d.LinkID = l.ID
    ) SELECT ID FROM cte WHERE rownum <= 20;
    

    Versions older than 8.0 do not support these features of SQL, so you have to get creative with user-defined variables or self-joins. See for example my answer to How to SELECT the newest four items per category?