Search code examples
mysqlsqlduplicatesinner-joinderived-table

How to remove duplicate records from a MySql 5.5. Derived Table, which aggregates data by date


The following table called hb_lead stores a unique lead, code and timestamp. Let's call the events recorded here reg

 id_hb_lead | lead_code | creation_date
 -----------|-----------|--------------
  1         | ABC       | 2018-10-01
  2         | DEF       | 2018-10-02
  3         | GHI       | 2018-10-02
  4         | CYZ       | 2018-10-03
  5         | HIJ       | 2018-10-10

The hb_lead_engagement table stores many times each lead engages with some stuff. Let's call the events registered here enga

 id_hb_lead_engagement | id_b_lead |  id_event | event_date
 ----------------------|-----------|-----------|----------- 
  1                    | 1         |       12  | 2018-10-02
  2                    | 2         |       12  | 2018-10-03
  3                    | 3         |       12  | 2018-10-03
  4                    | 4         |       12  | 2018-10-06
  5                    | 3         |       12  | 2018-10-08
  6                    | 5         |       12  | 2018-10-16
  7                    | 2         |       12  | 2018-10-19
  8                    | 3         |       12  | 2018-10-10
  9                    | 5         |       12  | 2018-10-10

I need to generate a table like this, where for each and unique date, the reg and enga events are counted. Please don't care about 0 or null, I can COALESCE with 0 or null ...

 date      |  reg     | enga
-----------|----------|-----------
2018-10-01 |    1     |    
2018-10-02 |    2     |    1
2018-10-03 |    1     |    2
2018-10-06 |          |    1 
2018-10-08 |          |    1
2018-10-10 |    1     |    2
2018-10-16 |          |    1
2018-10-19 |          |    1

Using a Derived Table which first UNION the two tables to list all event dates, an then COUNT events, I created a table like this, which fails in duplicating dates when is the same date both reg and enga events are larger than 1.

 date      |  regs          | engas
-----------|----------------|-----------
2018-10-01 |    1           |    
2018-10-02 |    2           |    
2018-10-02 |                |    1
2018-10-03 |    1           |    
2018-10-03 |                |    2
2018-10-06 |                |    1 
2018-10-08 |                |    1
2018-10-10 |    1           |     
2018-10-10 |                |    2
2018-10-16 |                |    1
2018-10-19 |                |    1

So I know that the way to solve this, say aggregate data for each row is to JOIN two tables mainly LEFT JOIN and then use CASE. However due to the fact that this is a derived table, it can be JOINED with itself, so my question is How do I aggregate the resulting derived table, knowing that derived tables can't be joined to itself.

If wondering how did I derived date, here is my query:

select *
from
(
  select   date_format(l.fecha_creacion, '%Y-%m-%d') as categoria, 
  count(*) as reg, '' as enga
  from hb.hb_lead l    
  group by  categoria

union 

 select  date_format(le.le_fecha_creacion, '%Y-%m-%d') fecha, '', 
 count(*) 
 from hb.hb_lead_engagement le
 where le.id_hb_evento = 12
 group by   fecha
 order by 1
) t1

I'm restricted to MySql version : 5.5.5-10.1.36-MariaDB, so the **With a as ** is out of discussion. Thanks.


Solution

  • You can try to UNION date column then use outer join base on it.

    **Schema (MySQL v5.5)**
    
    create table hb_lead (
       id_hb_lead int,
       lead_code varchar(50),
       creation_date date
    ); 
    
    
    
    insert into hb_lead values (1,'ABC', '2018-10-01');
    insert into hb_lead values (2,'DEF', '2018-10-02');
    insert into hb_lead values (3,'GHI', '2018-10-02');
    insert into hb_lead values (4,'CYZ', '2018-10-03');
    insert into hb_lead values (5,'HIJ', '2018-10-10');
    
    
    create table hb_lead_engagement (
       id_hb_lead_engagement int,
       id_b_lead int,
       id_event int,
      event_date date
    );
    
    
    
    insert into hb_lead_engagement values (1,1,12, '2018-10-02');
    insert into hb_lead_engagement values (2,2,12, '2018-10-03');
    insert into hb_lead_engagement values (3,3,12, '2018-10-03');
    insert into hb_lead_engagement values (4,4,12, '2018-10-06');
    insert into hb_lead_engagement values (5,3,12, '2018-10-08');
    insert into hb_lead_engagement values (6,5,12, '2018-10-16');
    insert into hb_lead_engagement values (7,2,12, '2018-10-19');
    insert into hb_lead_engagement values (8,3,12, '2018-10-10');
    insert into hb_lead_engagement values (9,5,12, '2018-10-10');
    

    Query #1

    select dt,
        COUNT(id_hb_lead) regs,
        COUNT(id_hb_lead_engagement) engas
    from 
    (
      select event_date dt from hb_lead_engagement
      UNION 
      SELECT creation_date FROM hb_lead  
    ) t1 
    LEFT JOIN hb_lead_engagement bhle on t1.dt = bhle.event_date
    LEFT JOIN hb_lead bhl on t1.dt = bhl.creation_date
    GROUP BY dt;
    
    | dt         | regs | engas |
    | ---------- | ---- | ----- |
    | 2018-10-01 | 1    | 0     |
    | 2018-10-02 | 2    | 2     |
    | 2018-10-03 | 2    | 2     |
    | 2018-10-06 | 0    | 1     |
    | 2018-10-08 | 0    | 1     |
    | 2018-10-10 | 2    | 2     |
    | 2018-10-16 | 0    | 1     |
    | 2018-10-19 | 0    | 1     |
    

    View on DB Fiddle