Search code examples
mysqlsqlcountsql-updatemysql-5.5

Insert data and fill columns with rank


original data: orginal_table

MID STATE CALL_TIME         RECORD_RANK
a   1    2020-12-18 09:00:00        1
a   2    2020-12-19 09:00:00        2
b   1    2020-12-18 09:00:02        1
c   1    2020-12-18 09:00:03        1
c   1    2020-12-19 09:00:03        2
c   1    2020-12-20 09:00:03        3
d   1    2020-12-19 09:00:00        1

The data I wanted to insert: insert_table

   MID  STATE      CALL_TIME  
   a    2     2020-12-30 09:00:00      
   b    2     2020-12-19 09:00:02   
   c    1     2020-12-21 09:00:03 
   e    1     2020-12-30 09:00:00 
   f    1     2020-12-30 09:00:00 
   f    2     2020-12-31 09:00:00

Goal

  • The original data will be inserted from the second data.
  • For original and inserted data, the pair MID and CALL_TIME is unique.
  • There is no RECORD_RANK column in the inserted data but RECORD_RANK will be calculated based on MID and CALL_TIME columns when inserted. When duplicated MID with different CALL_TIME, the value of RECORD_RANK with MID will be added by 1. The initial value is 1.
  • The earliest row in in insert_table is always later than the latest row in orginal_table with the same MID.

The expected example result as below:

MID  STATE      CALL_TIME         RECORD_RANK
a    1    2020-12-18 09:00:00        1
a    2    2020-12-19 09:00:00        2
b    1    2020-12-18 09:00:02        1
c    1    2020-12-18 09:00:03        1
c    1    2020-12-19 09:00:03        2
c    1    2020-12-20 09:00:03        3
d    1    2020-12-19 09:00:00        1
a    2    2020-12-30 09:00:00        3
b    2    2020-12-19 09:00:02        2  
c    1    2020-12-21 09:00:03        4
e    1    2020-12-30 09:00:00        1
f    1    2020-12-30 09:00:00        1 
f    2    2020-12-31 09:00:00        2

Note

  • mysql version: 5.5.47-log

Solution

  • I think it is possible to handle the logic in a single insert, even in MySQL 5.x.

    The target rank is the number of rows that already exists in the target table for the same mid, plus the number of mid rows in the source table prior to the current row. You can compute that with correlated subqueries:

    insert into orginal_table (mid, state, call_time, record_rank)
    select mid, state, call_time,
        1 + (
            select count(*)
            from orginal_table o
            where o.mid = i.mid
        ) + (
            select count(*) 
            from insert_table i1 
            where i1.mid = i.mid and i1.call_time < i.call_time
        ) as record_rank
    from insert_table i
    

    This assumes that all new rows are more recent that existing rows, as mentioned in your question. But if you want otherwise, that's an easy fix to the first subquery:

    (
        select count(*)
        from orginal_table o
        where o.mid = i.mid and o.call_time < i.call_time
    )
    

    Here is a demo based on the nice test case built by Akina.

    Side note: in MySQL 8.0, we would use a window function instead of the second subquery, which would make the query much more efficient:

    insert into orginal_table (mid, state, call_time, record_rank)
    select mid, state, call_time,
        row_number() over(partition by mid order by call_time) 
        + (
            select count(*)
            from orginal_table o
            where o.mid = i.mid
        ) as record_rank
    from insert_table i