Search code examples
mysqlsqlcountinner-joingreatest-n-per-group

Trying to grab a column from 2 subqueries being joined and grouping not working


WHAT MY TABLES LOOK LIKE:

mysql> select * from customer limit 3;
+-------------+---------------+-----------------------+------+--------+-------------+-------+
| customer_id | customer_name | profession            | age  | salary | town        | state |
+-------------+---------------+-----------------------+------+--------+-------------+-------+
|           1 | Julio Sperski | Architect             |   70 |  52016 | Conroe      | TX    |
|           2 | Micah Inchley | Biological  scientist |   86 |  45355 | Omaha       | NE    |
|           3 | Brigg Denny   | Chemist               |   80 |  21754 | Bakersfield | CA    |
+-------------+---------------+-----------------------+------+--------+-------------+-------+
3 rows in set (0.00 sec)

mysql> select * from vehicle limit 3;
+------------+---------------+--------------------+--------+--------+---------+----------------+--------------------+-----------------------+
| vehicle_id | vehicle_plate | registration_state | color  | make   | model   | vehicle_type   | per_day_rental_fee | per_day_insurance_fee |
+------------+---------------+--------------------+--------+--------+---------+----------------+--------------------+-----------------------+
|          1 | W9FLYC7       | TX                 | black  | toyota | cruiser | mid size sedan |                 44 |                    27 |
|          2 | CA1CJIZ       | NE                 | silver | ford   | se      | suv            |                 96 |                    71 |
|          3 | HB5YI9A       | CA                 | silver | dodge  | mpv     | truck          |                 26 |                    28 |
+------------+---------------+--------------------+--------+--------+---------+----------------+--------------------+-----------------------+
3 rows in set (0.00 sec)

mysql> select * from rental limit 3;
+-----------+-------------+------------+-------------------+--------------------+-------------------------+----------------------------+
| rental_id | customer_id | vehicle_id | start_rental_date | return_rental_date | per_day_rental_fee_paid | per_day_insurance_fee_paid |
+-----------+-------------+------------+-------------------+--------------------+-------------------------+----------------------------+
|         1 |          32 |          4 | 3/4/2019          | 3/6/2019           | no                      | no                         |
|         2 |          42 |         39 | 3/23/2019         | 3/24/2019          | yes                     | yes                        |
|         3 |          33 |         14 | 10/18/2020        | 10/24/2020         | no                      | no                         |
+-----------+-------------+------------+-------------------+--------------------+-------------------------+----------------------------+
3 rows in set (0.00 sec)

customer_id, vehicle_id, and rental_id are primary keys.

customer_id and vehicle_id are foreign keys pointing to customer and vehicle tables respectively.

So I'm trying to grab a 3rd column and display it. The query is something along the lines of this:

For each registration state that a rental car company operates in, return the most amount of customers from a certain customer state. Like if there are 5 customers that are from TX but are renting a car registered in CA and there are only 2 customers from NY renting cars registered in CA, I would return the 5 from TX renting in CA and so on until I finish all the registration states.

FIRST QUERY, The closest I have gotten is with this query:

SELECT registration_state, 
       Max(count) AS count 
FROM   (SELECT registration_state, 
               Count(a.state) AS count, 
               a.state 
        FROM   rental c 
               LEFT JOIN customer a 
                      ON a.customer_id = c.customer_id 
               LEFT JOIN vehicle b 
                      ON c.vehicle_id = b.vehicle_id 
        GROUP  BY registration_state, 
                  a.state)Z 
GROUP  BY registration_state 
ORDER  BY registration_state, 
          count DESC; 

+--------------------+-------+
| registration_state | count |
+--------------------+-------+
| AL                 |     1 |
| CA                 |     5 |
| DC                 |     1 |
| DE                 |     2 |
| FL                 |     3 |
| IL                 |     2 |
| IN                 |     1 |
| MD                 |     2 |
| MI                 |     1 |
| MN                 |     1 |
| MO                 |     1 |
| NE                 |     1 |
| NV                 |     2 |
| NY                 |     3 |
| OH                 |     2 |
| OR                 |     1 |
| PA                 |     1 |
| SC                 |     1 |
| TN                 |     3 |
| TX                 |     7 |
| WA                 |     1 |
+--------------------+-------+
21 rows in set (0.01 sec)

However it displays only the registration_state and count of whichever customer state rents the most in that state that the rental car company rents cars in but without the customer state displayed, I want the customer state to be displayed.

SECOND QUERY, The following query generates and returns counts of the amount of rentals from each customer state for every registration state:

SELECT registration_state, 
       Count(d.state) AS count, 
       d.state 
FROM   rental f 
       LEFT JOIN customer d 
              ON d.customer_id = f.customer_id 
       LEFT JOIN vehicle e 
              ON f.vehicle_id = e.vehicle_id 
GROUP  BY registration_state, 
          d.state 
ORDER  BY registration_state, 
          count DESC; 

+--------------------+-------+-------+
| registration_state | count | state |
+--------------------+-------+-------+
| AL                 |     1 | NY    |
| CA                 |     5 | CA    |
| CA                 |     4 | MO    |
| CA                 |     2 | IN    |
| CA                 |     2 | TN    |
| CA                 |     2 | TX    |
| CA                 |     2 | OH    |
| CA                 |     1 | FL    |
| CA                 |     1 | AL    |
| CA                 |     1 | MI    |
| CA                 |     1 | NE    |
| CA                 |     1 | WA    |
| DC                 |     1 | CA    |
| DC                 |     1 | IL    |
| DC                 |     1 | TX    |
| DE                 |     2 | NY    |
| DE                 |     1 | FL    |
| FL                 |     3 | NY    |
| FL                 |     1 | AL    |
| FL                 |     1 | OH    |
| FL                 |     1 | CA    |
| FL                 |     1 | FL    |
| FL                 |     1 | MI    |
| FL                 |     1 | TX    |
| IL                 |     2 | OR    |
| IL                 |     1 | TX    |
| IL                 |     1 | CA    |
| IN                 |     1 | NV    |
| IN                 |     1 | CA    |
| MD                 |     2 | WA    |
| MD                 |     1 | OH    |
| MD                 |     1 | MD    |
| MI                 |     1 | PA    |
| MN                 |     1 | PA    |
| MN                 |     1 | TN    |
| MN                 |     1 | FL    |
| MO                 |     1 | NY    |
| MO                 |     1 | SC    |
| MO                 |     1 | OH    |
| MO                 |     1 | OR    |
| MO                 |     1 | CA    |
| MO                 |     1 | FL    |
| MO                 |     1 | TX    |
| NE                 |     1 | CA    |
| NV                 |     2 | FL    |
| NY                 |     3 | TX    |
| NY                 |     1 | CA    |
| NY                 |     1 | FL    |
| NY                 |     1 | MO    |
| NY                 |     1 | NY    |
| OH                 |     2 | OH    |
| OH                 |     1 | TN    |
| OH                 |     1 | NE    |
| OH                 |     1 | PA    |
| OH                 |     1 | DC    |
| OH                 |     1 | NY    |
| OR                 |     1 | IN    |
| OR                 |     1 | CA    |
| PA                 |     1 | MO    |
| PA                 |     1 | DC    |
| SC                 |     1 | FL    |
| SC                 |     1 | NY    |
| TN                 |     3 | TX    |
| TN                 |     2 | CA    |
| TN                 |     1 | FL    |
| TN                 |     1 | PA    |
| TN                 |     1 | MI    |
| TN                 |     1 | OH    |
| TN                 |     1 | OR    |
| TN                 |     1 | MO    |
| TX                 |     7 | NY    |
| TX                 |     4 | TX    |
| TX                 |     3 | FL    |
| TX                 |     2 | MO    |
| TX                 |     2 | MD    |
| TX                 |     2 | DC    |
| TX                 |     1 | IN    |
| TX                 |     1 | OH    |
| TX                 |     1 | CA    |
| TX                 |     1 | NV    |
| TX                 |     1 | OR    |
| TX                 |     1 | IL    |
| WA                 |     1 | TN    |
| WA                 |     1 | MI    |
+--------------------+-------+-------+
84 rows in set (0.00 sec)

As you can see in the above table, if you compare it with the first table, you will see that the first table only returns the customer state that has the most rentals in that registration state.

I am trying to get it to look like this:

+--------------------+-------+-------+
| registration_state | count | state |
+--------------------+-------+-------+
| AL                 |     1 | NY    |
| CA                 |     5 | CA    |
| DC                 |     1 | CA    |
| DE                 |     2 | NY    |
| FL                 |     3 | NY    |
| IL                 |     2 | OR    |
| IN                 |     1 | NV    |
| MD                 |     2 | WA    |
| MI                 |     1 | PA    |
| MN                 |     1 | PA    |
| MO                 |     1 | NY    |
| NE                 |     1 | CA    |
| NV                 |     2 | FL    |
| NY                 |     3 | TX    |
| OH                 |     2 | OH    |
| OR                 |     1 | IN    |
| PA                 |     1 | MO    |
| SC                 |     1 | FL    |
| TN                 |     3 | TX    |
| TX                 |     7 | NY    |
| WA                 |     1 | TN    |
+--------------------+-------+-------+

I have tried joining like so:

SELECT    registration_state, 
          MAX(count)
FROM      ( 
                    SELECT    registration_state, 
                              Count(d.state) AS count, 
                              d.state 
                    FROM      rental f 
                    LEFT JOIN customer d 
                    ON        d.customer_id = f.customer_id 
                    LEFT JOIN vehicle e 
                    ON        f.vehicle_id = e.vehicle_id 
                    GROUP BY  registration_state, 
                              d.state)W 
RIGHT JOIN 
          ( 
                    SELECT    registration_state AS registration_state2, 
                              count(a.state)     AS count2, 
                              a.state 
                    FROM      rental c 
                    LEFT JOIN customer a 
                    ON        a.customer_id = c.customer_id 
                    LEFT JOIN vehicle b 
                    ON        c.vehicle_id = b.vehicle_id 
                    GROUP BY  registration_state, 
                              a.state )z 
ON        z.count2=W.count 
AND      z.registration_state2=W.registration_state
GROUP BY registration_state
ORDER BY registration_state;


but it just returns the same as the first query, if I add state from either of the subqueries to the select statement in the first line and also group by said state at the end, I end up with the second query.

Are there any suggestions on how to get it to return the way I want?


Solution

  • If you are running MySQL 8.0, you can use window functions to bring the customer state with most rentals for each registration state:

    select registration_state, state, cnt
    from (
        select v.registration_state, c.state, count(*) as cnt,
            rank() over(partition by v.registration_state order by count(*) desc) rn
        from rental r
        inner join customer c on c.customer_id = r.customer_id 
        inner join vehicle  v on r.vehicle_id = v.vehicle_id 
        group  by v.registration_state, c.state 
    ) t
    where rn = 1
    order  by registration_state;
    

    In ealier versions, it is a bit more complicated. One option uses a row-limiting correlated subquery in the having clause:

    select v.registration_state, c.state, count(*) as cnt
    from rental r
    inner join customer c on c.customer_id = r.customer_id 
    inner join vehicle  v on r.vehicle_id = v.vehicle_id 
    group  by v.registration_state, c.state 
    having count(*) = (
        select count(*)
        from rental r1
        inner join customer c1 on c1.customer_id = r1.customer_id 
        inner join vehicle  v1 on r1.vehicle_id = v1.vehicle_id 
        where c1.state = c.state and v1.registration_state = v.registration_state
        group by v1.registration_state, c1.state 
        order by count(*) desc limit 1
    )
    

    Note that both queries allow top ties, if any.