Search code examples
mysqlgroup-byhaving-clause

Group if COUNT(*) > X, otherwise don't group items


I'm trying to return a list of map items from my DB grouping the items ONLY if the group count is greater than 4 otherwise I don't want the items to be grouped.

My project is built to return all entries within a set area and I'll be using grouping to break up that area into a grid. If each cell in the grid has too many results then I will show a group marker with the group count instead of a single entry marker.

My current query works to determine if there are grouped items by the COUNT(*)

SELECT *, COUNT(*) as groupCount, floor(longitude/0.0007) AS groupLong, floor(latitude/0.0007) AS groupLat 
FROM items 
WHERE longitude>=151.1 
    AND longitude<=151.2 
    AND latitude>=-33.9 
    AND latitude<=-33.8 
GROUP BY floor(longitude/0.0007), floor(latitude/0.0007)

What I would like to do is only group the items if the group count > 4 and items in a group <=4 will be returned as ungrouped items.

I know that I can use HAVING COUNT(*)>4 to only return the groups of 5 and over but what can I do to return the ungrouped items that would be in the groups of 4 and under?

I would be happy to do two queries to get the results but if there was a way of doing it in one then that would be great!


Solution

  • Here's a SQL Fiddle for my solution: http://sqlfiddle.com/#!8/e40ba/1

    The idea is to first figure out what the groups will be, which is done in the subquery grouping. We then join these groups to the original table, but we use a left outer join so any value that isn't in a group will get have nulls for the grouping columns. Finally, we use AVG to come up with a representative coordinate for the group. For ungrouped values, this will be the coordinate itself, which is nice.

    I would recommend against actually using this without doing some testing and benchmarking. That join is probably going to be terrible if items is large. I really just wanted to find a way to do this in a single query. As I said in my comment, the right way to do this would be to use window functions, but MySQL doesn't have those.

    SELECT AVG(longitude) AS longitude
         , AVG(latitude) AS latitude
         , COUNT(*) AS count
    
    FROM items
        LEFT OUTER JOIN
            ( SELECT COUNT(*) AS group_count
                   , FLOOR(longitude/0.0007) AS group_longitude
                   , FLOOR(latitude/0.0007) AS group_latitude
              FROM items
              -- Repeat the filter to avoid computing unnecessary groups
              WHERE longitude >= 151.1
                AND longitude <= 151.2
                AND latitude >= -33.9
                AND latitude <= -33.8
              GROUP BY group_longitude, group_latitude
              HAVING group_count > 4
            ) AS grouping
        -- Match each row up with its group
        ON FLOOR(longitude/0.0007) = group_longitude
        AND FLOOR(latitude/0.0007) = group_latitude
    
    WHERE longitude >= 151.1
      AND longitude <= 151.2
      AND latitude >= -33.9
      AND latitude <= -33.8
    
    GROUP BY COALESCE(group_longitude, id)
           , COALESCE(group_latitude, id)