Search code examples
phpmysqlsqlmode

Calculating Mode within existing query to count rows in PHP and MySQL


I have a table rel with fields id refid cellid cat and some working PHP code to count the number of rows for each cellid and return this as an array for later use.

<?php
// connect to database
require($DOCUMENT_ROOT . "connect.php");
// count the number of relationships for each cell id
  $cell_array = array();
  if ($result = mysqli_query($link, "SELECT cellid, COUNT(*) totalcount, cat FROM rel GROUP BY cellid")) {
    if (mysqli_num_rows($result)) {
      while ($row = mysqli_fetch_assoc($result)) {
        $cell_array[$row["cellid"]] = $row['totalcount'];
      }
    }
  }
?>

I now want to add an additional function, calculate the Mode (most common value) of cat for each cellid and include that value in the array. If there is more than one mode or no mode then return 9 (cat can only have values of 1 to 8).

I've been reading around how to find the Mode and the common solution looks like this:

SELECT COUNT(*) frequency, cat, cellid
FROM rel
GROUP BY cat
ORDER BY COUNT(*) DESC
LIMIT 1

But this does not return the most common cat for each cellid just the most common cat in general.

I have been working using a sub-query

SELECT cellid, cat, COUNT( * ) 
FROM  `rel` 
GROUP BY cellid, cat

Which produces a row for each cellid cat combination, but I'm unsure how to then use this to find the total number of entries for each cellid and the the most common cat for each cellid

EDIT

I have made some further progress, I now have a working query to find the mode cat for each cellid

SELECT cellid, cat
FROM rel t
GROUP BY cellid, cat
HAVING cat = ( 
SELECT cat
FROM rel
WHERE cellid = t.cellid
GROUP BY cat
ORDER BY COUNT( * ) DESC , cat
LIMIT 1 )

But I am still unsure how to combine the two queries are the output from one cannot be used in the other.


Solution

  • You can combine them by joining them together:

    select cell.*, mode.cat as modecat
    from (SELECT cellid, COUNT(*) totalcount
          FROM rel
          GROUP BY cellid
         ) cell join
         (SELECT cellid, cat
          FROM rel t
          GROUP BY cellid, cat
          HAVING cat = (SELECT cat
                        FROM rel
                        WHERE cellid = t.cellid
                        GROUP BY cat
                        ORDER BY COUNT( * ) DESC , cat
                        LIMIT 1
                      )
         ) mode
         on cell.cellid = mode.cnt
    

    This doesn't handle the case of multiple values. If you only wanted an arbitrary value, I would go with:

    select cell.*,
           (select cat
            from rel
            where cell.cellid = rel.cellid
            group by cellid, cat
            order by COUNT(*) desc
            limit 1
           ) as mode
    from (SELECT cellid, COUNT(*) totalcount
          FROM rel
          GROUP BY cellid
         ) cell
    

    Identifying muliples is a bit trickier. It requires an additional aggregation in the subquery:

    select cell.*,
           (select (case when min(cat) = max(cat) then min(cat) else 'MULTIPLE' end) as mode
            from (select cat, COUNT(*) as cnt
                  from rel
                  where cell.cellid = rel.cellid
                  group by cellid, cat
                 ) t
            group by cnt
            order by cnt desc
            limit 1
           ) mode
    from (SELECT cellid, COUNT(*) totalcount
          FROM rel
          GROUP BY cellid
         ) cell