I received sales data from a client to integrate it into his datawarehouse. Normally there are two columns to define a sales row - num_transcation
and num_line_transaction
. But I received a lot of rows with the same num_transcation
and num_line_transaction
so when I execute this query:
select *
from
(select
ROW_NUMBER() over(partition by ll_vente_num_transaction, ll_vente_num_ligne_transaction
order by ll_vente_num_transaction) rn, *
from my_table) t
where
rn > 2
I get almost 43000 rows with the same identity (num_transcation
and num_line_transaction
) and the only thing that differs those rows is the sales date.
I need a way to update "num_line_transaction" to be able to load the data in the datawarhouse, otherwise it will load only one row of every "num_transcation" and "num_line_transaction" combination.
For example:
+-----------------+-----------------------+
| NUM_TRANSACTION | NUM_LIGNE_TRANSACTION |
+-----------------+-----------------------+
| 106969796 | 3148291 |
| 106969796 | 3148291 |
| 106969796 | 3148291 |
| 106969796 | 3148291 |
| 106969796 | 3148291 |
| 106969796 | 3148291 |
| 106969796 | 3148291 |
+-----------------+-----------------------+
I need to update those rows to become:
+-----------------+-------------------------+
| NUM_TRANSACTION | NUM_LIGNE_TRANSACTION |
+-----------------+-------------------------+
| 106969796 | 3148291 |
| 106969796 | 3148292 |
| 106969796 | 3148293 |
| 106969796 | 3148294 |
| 106969796 | 3148295 |
| 106969796 | 3148296 |
| 106969796 | 3148297 |
+-----------------+-------------------------+
Please, any idea will be appreciated. I'm stuck here.
If you have a distinguishing (unique column such as sales_date
as you mentioned ), the following statement will help :
with t2 as
(
select t.*,
row_number() over (partition by num_transaction order by num_ligne_transaction)+
num_ligne_transaction - 1
as rn
from tab t
)
update t1
set t1.num_ligne_transaction = t2.rn
from tab t1
join t2
on t1.num_transaction = t1.num_transaction
and t1.sales_date = t2.sales_date;