I am using Oracle database capturing transactional data.
Table 1: The transaction data of users opening the survey form. A recipient with the email address could potentially open the survey form couple of times but did not fill in and submit. The same survey can be determined by the Survey_No.
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/21/2020);
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/19/2020);
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/10/2020);
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/3/2020);
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/1/2020);
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (2, john@email.com,3/1/2020);
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (2, john@email.com,3/3/2020);
INSERT INTO survey_open (SURVEY_NO,EMAIL,DATE) VALUES (3, john@email.com,2/1/2020);
Table 2: The transaction data of the users submitting the survey. They could resubmit their choices again and again. Again, the survey_no will give info on which survey the response was for.
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/21/2020);
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/15/2020);
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/15/2020);
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (1, john@email.com,5/15/2020);
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (2, john@email.com,3/15/2020);
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (2, john@email.com,3/15/2020);
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (2, john@email.com,3/15/2020);
INSERT INTO survey_fill (SURVEY_NO,EMAIL,DATE) VALUES (3, john@email.com,3/1/2020);
The expected output that i am trying to get is to form a table that joins up the above two tables based on the nearest date that makes sense as a proxy to join. Which of the transaction on the survey_open table get tagged with a filled_date is not that important. Since the survey_id and email is many to many, i do not want to form a cross join.
Try to number items in both tables
with o as(
select so.*, row_number() over(partition by survey_no order by open_date) rn
from survey_open so
), f as (
select sf.*, row_number() over(partition by survey_no order by fill_date) rn
from survey_fill sf
)
select o.survey_no
, o.email
, o.open_date
, (select min(f.fill_date)
from f
where f.survey_no = o.survey_no
and f.fill_date >= o.open_date
and f.rn >= o.rn) as fill_date
from o
order by o.survey_no, o.rn desc