Search code examples
mysqlsqlsqlyog

MySQL - get only record which userids have two specific different values in one column


In my log_attendance table there are record of employees, i'd like to get only employees which have mu 3 and 4 on their I(in) and O(out), for example on this table, i will get garcia, arena, imperial and macandil records only, because they have mu 3 and 4.

enter image description here

SELECT 
  Emp_name,
  loc,
  dept_name,
  DATE(CheckTime) AS date,
  TIME(CheckTime) AS time,
  CheckType AS type,
  mu AS device,
  COUNT(*) AS summary 
FROM
  log_attendance 
WHERE DATE(CheckTime) = '2015-05-27' 
  AND loc_id = 1 
  AND mu IN (3, 4)
GROUP BY Userid,
  mu
ORDER BY dept_id, Emp_name,
  mu DESC 

Solution

  • There are couple of different query patterns that will return the specified result. Assuming that Emp_name is the unique identifier for an employee, and you want to return all rows for an employee, if there are at least two rows with that Emp_name, and at least one of those rows has mu value of 3, and at least one of the rows has an mu value of 4.

    We're ignoring the "date" and "time" columns. We're a little suspicious that there may be some conditions, as yet unstated, on those columns as well.

    (If we can understand and accurately describe the specification, the battle is half won.)

    There's a couple of approaches.

    One of the easiest to understand is to use an EXISTS predicate to test for the existence of another row. Here's a way to get a distinct list of Emp_name that satisfy the specification.

    SELECT a.Emp_name 
      FROM log_attendance a
     WHERE a.mu = '3'
       AND EXISTS ( SELECT 1
                      FROM log_attendance o
                     WHERE o.Emp_name = a.Emp_name
                      AND o.mu = '4'
                  )
     GROUP BY a.Emp_name
    

    If all we need is the Emp_name, we're done. But if we want to return all the rows for those Emp_name, we can use that result in a join operation. To do that, we can wrap the query in parens, and use it as an inline view, reference it in the FROM clause in place of a table. For example:

    SELECT d.Emp_name
         , d.loc
         , d.dept_name
         , DATE(d.CheckTime) AS `date`
         , TIME(d.CheckTime) AS `time`
         , d.CheckType       AS `type`
         , d.mu              AS `device`
      FROM ( 
             -- the query from above goes here, between parens
             SELECT a.Emp_name 
               FROM log_attendance a
              WHERE a.mu = '3'
                AND EXISTS ( SELECT 1
                               FROM log_attendance o
                              WHERE o.Emp_name = a.Emp_name
                                AND o.mu = '4'
                            )
           ) e
      JOIN log_attendance d
        ON d.Emp_name = e.Emp_name
      ORDER BY d.Emp_name, d.CheckTime
    

    This will return all the detail rows for the Emp_name that meet the specified criteria.

    We can't add an aggregate function e.g. COUNT(*) to the SELECT list without collapsing all the rows into a single row. Before we do that, we need to understand what we're trying to return.

    Do we want to return just a count of the rows for each of those Emp_name? Or a combined count of all rows for those Emp_name? Do we want a count of mu=3 rows and a count of mu=4 rows for those Emp_name? How we write the query depends on what resultset we want to return.

    There are several other approaches to getting the list of Emp_name that have both mu=3 and mu-4 rows.

    We could get all the rows that have mu=3 or mu=4, then group those rows by Emp_name, and the count the distinct values that appear in mu for each Emp_name, and exclude all rows where that count is not equal to 2. For example:

    SELECT a.Emp_name
      FROM log_attendance a
     WHERE a.mu IN (3,4)
     GROUP BY a.Emp_name
    HAVING COUNT(DISTINCT a.mu) = 2 
    

    (This pattern is also suited for other cases, such as listing Emp_name that have at least two out of three possible values of mu.)

    This query could be used also be used as an inline view, the same as the previous query.

    SELECT d.Emp_name
         , d.loc
         , ...
      FROM ( SELECT a.Emp_name
               FROM log_attendance a
              WHERE a.mu IN (3,4)
              GROUP BY a.Emp_name
             HAVING COUNT(DISTINCT a.mu) = 2 
           ) e
      JOIN log_attendance d
        ON d.Emp_name = e.Emp_name
     ORDER BY d.Emp_name, d.CheckTime
    

    We could also use a query that has two EXISTS predicates to return an equivalent result, for example:

    SELECT d.Emp_name
         , d.
      FROM log_attendance d
     WHERE EXISTS ( SELECT 1
                      FROM log_attendance a
                     WHERE a.mu = 3
                       AND a.Emp_name = d.Emp_name
                  )
       AND EXISTS ( SELECT 1
                      FROM log_attendance b
                     WHERE b.mu = 3
                       AND b.Emp_name = d.Emp_name
                  )
    

    We could obtain an equivalent result using two Emp_name IN (subquery) predicates, with one subquery returning a list of Emp_name that have mu=3 rows, and the other subquery returning a list of Emp_name that have mu=4 rows.

    Note that the queries demonstrated here return all rows for the Emp_name, not just the mu=3 and mu=4 rows. If that's part of the specification, we can add an appropriate predicate to filter out the rows we don't want to return.

    And this is not an exhaustive list, there are couple of other query patterns we could use.