Search code examples
mysqlleft-joininner-join

Identifying faulty data in MySQL table


Apologies for asking a question that may have answers in some form or another on here, but I was unable to make any of those solutions work for me.

I have the following query:

SELECT `user_id`, `application_id`, `unallocated_date`, `check_in_date`, `check_out_date` 
FROM `student_room` 
WHERE `user_id` = 17225
ORDER BY `application_id` DESC

It produces the following result:

user_id | application_id | unallocated_date    | check_in_date       | check_out_date   
--------+----------------+---------------------+---------------------+---------------
17225   | 30782          | 2018-02-04 14:32:29 | NULL                | NULL
17225   | 30782          | 2018-02-04 14:32:49 | NULL                | NULL
17225   | 30782          | 2018-02-04 14:32:51 | NULL                | NULL
17225   | 30782          | NULL                | NULL                | NULL
17225   | 30782          | NULL                | 2018-02-04 14:41:54 | NULL

The fourth row in the result is a fault in my data; it should look similar to the first three rows - these rows occur happens when a student was allocated a new room, and the previous one needs to be unallocated. In this case, the unallocation of row 4 did not actually happen due to either a historical bug in the system I am working on, or user error, but most likely the former.

How can I identify ALL such rows? My attempts with GROUP BY and HAVING did not work, as I checked where all three date fields were NULL, but it did not pick up this particular user - so I was doing something wrong. My original query was:

SELECT COUNT(user_id) AS `count`, user_id FROM `student_room`
WHERE `unallocated_date` IS NULL
AND `check_in_date` IS NULL
AND `check_out_date` IS NULL
GROUP BY `user_id`
HAVING COUNT(user_id) > 1
ORDER BY `user_id` ASC

I tried various INNER JOIN attempts too, but I did not of them right...

The rows that I am interested in will have at least one entry with all three dates NULL, but also one where there is a check_in_date that is NOT NULL, as per this example. If I only had the first four rows, then the data could be correct, but the fifth row's presence makes the fourth row a faulty record - it should've been given an "unallocated_date" value at the time of the allocation of the room in the fifth row, which for some reason did not happen.


Solution

  • Together with a friend of mine, we made the following query that works. I have now learned that you can use "EXISTS" in MySQL. I saw it used when dropping or creating tables, but never like this. It ended up that this query solves the problem:

    SELECT cte.user_id, COUNT(*)
    FROM (
        SELECT sro.user_id
        FROM student_room AS sro
        WHERE sro.unallocated_date IS NULL
        AND sro.check_in_date IS NULL
        AND sro.check_out_date IS NULL
        AND EXISTS ( 
            SELECT *
            FROM student_room AS sri
            WHERE sri.user_id = sro.user_id
            AND sri.student_room_id > sro.student_room_id
        )
        ORDER BY user_id DESC
    )
    AS cte
    GROUP BY cte.user_ID
    ORDER BY COUNT(*) DESC
    

    This query is the result of more than an hour of tinkering with records that was erroneous, so apologies if this appears to not match the question's requirements 100%, but it does solve the problem for me.