I have data for event registrations where users have an account on the website (user_id), submit a form for registration (nametag), and pay via stripe (order_id).
Each registration must have a unique payment associated with it. Sometimes one person will log in and register themselves and their friends, which trips up the current model.
The below query results in the correct answer. Is there a simpler way to do it?
-- registration form table
with forms as (
select
1 as user_id,
'jeff' as nametag
union all
select
1 as user_id,
'john' as nametag
union all
select
1 as user_id,
'jim' as nametag
),
-- paid orders table
orders as (
select
1 as user_id,
'abc' as order_id
union all
select
1 as user_id,
'def' as order_id
union all
select
1 as user_id,
'zyx' as order_id
),
-- my query
window1 as (
select
forms.*,
orders.order_id,
row_number() over (partition by order_id order by orders.order_id, forms.nametag) as row_num1
from forms
inner join orders
on forms.user_id = orders.user_id
)
select * except (row_num1, row_num2)
from (
select
*,
row_number() over (partition by row_num1 order by order_id) as row_num2
from window1
)
where row_num1 = row_num2
output:
user_id | nametag | order_id |
---|---|---|
1 | jeff | abc |
1 | jim | def |
1 | john | xyz |
It would be good for design to have some column to order by like transaction timestamp, and respectively user registration timestamp.
Assuming it's not present the following should simplify your query:
select
vw_forms.user_id,
vw_forms.nametag,
vw_orders.order_id
from
(select *, row_number() over (partition by user_id) as r_forms from forms) vw_forms
inner join
(select *, row_number() over (partition by user_id) as r_orders from orders) vw_orders
on
vw_forms.user_id = vw_orders.user_id and vw_forms.r_forms = vw_orders.r_orders
I skip ordering, because from your description I cannot really discern why would you do it.