Search code examples
mysqlsqldatabaseinner-joingreatest-n-per-group

Filter a join on each primary-foreign key relation only


I am using MySql.

I have table job that has a primary key job_pk_id and the table stores details of every job. Also I have a table job_running_status where job table's job_pk_id is a foreign key and this table basically contains records of when a job ran for each job_pk_id.There will be multiple entries for the same job_pk_id as the same job runs multiple times. job_running_status table also has a field job_start_time that gives the start time for each instance of the running of the job.

Now my requirement is to get the latest job_running_status for every job . The latest job_running_status would be chosen based on the latest job_start_time(for that particular job only) value in job_running_status.

I know this can be achieved using INNER JOIN and ORDER BY job_start_time desc between job table and job_running_status table but my challenge is this ORDER BY becomes applicable across all the jobs in the JOIN but I need to be applicable only across records that are corresponding to a particular job.

EDIT I understand it might be confusing to understand me by just reading so I am providing some examples:

job table:

Job Table Image

job_running_status table:

job_running_status image

My final requirement after joining both the tables

My final requirement after joining both the tables

Note: while joining I a should be getting only 1 record corresponding to every JOB table record. This record is chosen based on the latest job_start_time for that JOB.


Solution

  • An example of a correlated sub query in a where clause

    drop table if exists t,t1;
    create table t(id int);
    create table t1(jid int,dt date);
    insert into t values
    (1),(2),(3);
    
    insert into t1 values
    (1,'2018-01-01'),
    (1,'2018-02-01'),
    (2,'2018-01-01'),
    (3,'2018-01-01'),
    (3,'2018-02-01'),
    (3,'2018-03-01');
    
    select t.id,t1.dt
    from t
    join t1 on t1.jid = t.id
    where t1.dt =(select max(dt) from t1 where t1.jid = t.id);
    
    +------+------------+
    | id   | dt         |
    +------+------------+
    |    1 | 2018-02-01 |
    |    2 | 2018-01-01 |
    |    3 | 2018-03-01 |
    +------+------------+
    3 rows in set (0.00 sec)
    

    If you need the latest n records and you are not on version 8.0 or higher you can use row number simulation

    select t.id,s.dt
    from t
    join 
    (select t1.jid,t1.dt ,
              if(t1.jid<>@p,@rn:=1,@rn:=@rn+1) rn,
              @p:=t1.jid p
    from t1 
    cross join (select @rn:=0,@p:=0) r
    order by t1.jid ,t1.dt desc
    ) s on s.jid = t.id
    where s.rn <= 2;
    
    +------+------------+
    | id   | dt         |
    +------+------------+
    |    1 | 2018-01-01 |
    |    1 | 2018-02-01 |
    |    2 | 2018-01-01 |
    |    3 | 2018-02-01 |
    |    3 | 2018-03-01 |
    +------+------------+