Search code examples
sqlsql-servert-sqlsql-updatessis-2008

Updating data to integrate it in my datawarehouse


I received sales data from a client to integrate it into his datawarehouse. Normally there are two columns to define a sales row - num_transcation and num_line_transaction. But I received a lot of rows with the same num_transcation and num_line_transaction so when I execute this query:

select * 
from 
    (select 
         ROW_NUMBER() over(partition by ll_vente_num_transaction, ll_vente_num_ligne_transaction 
                           order by ll_vente_num_transaction) rn, * 
     from my_table) t
where 
    rn > 2 

I get almost 43000 rows with the same identity (num_transcation and num_line_transaction) and the only thing that differs those rows is the sales date.

I need a way to update "num_line_transaction" to be able to load the data in the datawarhouse, otherwise it will load only one row of every "num_transcation" and "num_line_transaction" combination.

For example:

+-----------------+-----------------------+
| NUM_TRANSACTION | NUM_LIGNE_TRANSACTION |
+-----------------+-----------------------+
|       106969796 |               3148291 |
|       106969796 |               3148291 |
|       106969796 |               3148291 |
|       106969796 |               3148291 |
|       106969796 |               3148291 |
|       106969796 |               3148291 |
|       106969796 |               3148291 |
+-----------------+-----------------------+

I need to update those rows to become:

+-----------------+-------------------------+
| NUM_TRANSACTION |   NUM_LIGNE_TRANSACTION |
+-----------------+-------------------------+
|       106969796 |                 3148291 |
|       106969796 |                 3148292 |
|       106969796 |                 3148293 |
|       106969796 |                 3148294 |
|       106969796 |                 3148295 |
|       106969796 |                 3148296 |
|       106969796 |                 3148297 |
+-----------------+-------------------------+

Please, any idea will be appreciated. I'm stuck here.


Solution

  • If you have a distinguishing (unique column such as sales_date as you mentioned ), the following statement will help :

    with t2 as
    (
    select t.*,
           row_number() over (partition by num_transaction order by num_ligne_transaction)+
           num_ligne_transaction - 1
            as rn 
     from tab t
    )
    update t1
       set t1.num_ligne_transaction = t2.rn
      from tab t1
      join t2
        on t1.num_transaction = t1.num_transaction
       and t1.sales_date = t2.sales_date;
    

    Demo