There is join between three tables, test_3
, test_2
and test_1
.
test_1
and test_3
are main tables and have no common columns. there are joined by table test_2
.
test_1
has sr_id
, last_updated_date
,
test_2
has sr_id
and sm_id
and test_3
has sm_id
,sql_statement
.
test_3
has clob data which causing all the trouble.
I have to find the latest sr_id
associated with sm_id
. My idea was to use an aggregate function max(last_updated_date)
and group it by.
And it doesnt happen for many reasons.
It contains CLOB data the column is sql_statement.
I have used a join which I am not familiar with.
Any ideas would be helpful.
WITH xx as (
(select ANSWER ,sr_id AS ID from test
WHERE Q_ID in (SELECT Q_ID FROM test_2 WHERE field_id='LM_LRE_Q6')
)
)
-- end of source data
SELECT t.ID, t1.n, t1.SM_ID,seg_dtls.SEGMENTation_NAME ,to_char(mst.LAST_UPDATED_DATE,'dd-mon-yyyy hh24:mi:ss'),seg_dtls.sql_statement
FROM xx t
CROSS JOIN LATERAL (
select LEVEL AS n, regexp_substr( t.answer, '\d+', 1, level) as SM_ID
from dual
connect by regexp_substr( t.answer, '\d+', 1, level) IS NOT NULL
) t1
left join test_1 mst
on mst.sr_id=t.id
right join test_3 seg_dtls
on seg_dtls.sm_id=t1.sm_id;
The sample data would look like
sr_id sm_id SEGMENTATION_NAME LAST_UPDATED_DATE
1108197 958 test_not_in 05-feb-2017 23:56:59
1108217 958 test_not_in 14-feb-2017 00:37:39
1108218 958 test_not_in 14-feb-2017 01:39:50
1108220 958 test_not_in 14-feb-2017 03:39:07
and the expected output is
1108220 958 test_not_in 14-feb-2017 03:39:07
I am not posting CLOB data because it is huge. Every row contains CLOB data.
table test_3 contains
q_id sr_id answer
1009330 1108246 976~feb_24^941~Test_regionwithcountry
1009330 1108247 941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24
1009330 1108239 972~test_emea
1009330 1108240 972~test_emea^827~test_with_region_country
1009330 1108251 981~MSE100579729 testing.
and sample data looks like above of test_3
Answer contains sm_id. I have to pull it from here.
for example:
941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24
the sm_id is 941,787,976
.
So I had come with the above query posted above.
Again, coming to left and right joins, all sm_id from test_3 is needed so i used the right join here.
edit1 : The accepted answer gives the SR_ID OF SEGMENTS with max(last_updated_date).
I would need all SR_ID. So, I used MINUS operator to get the ones that are not max(last_updated_date).
I need to append that result set to accepted answer.
This is what I did to get the other SR_IDs.
select sr_id,segmentation_name,request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6'))
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level),'\d+') as sm_id
from test_31
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select sr_id,
sm_id,
segmentation_name,
LAST_UPDATED_DATE,
sql_statement,request_status
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
t1.request_status
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test1 t1 on t1.sr_id = t3.sr_id
)
)
minus
(select sr_id,segmentation_name , request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6'))
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id
from test_31
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select sr_id,
segmentation_name,
sql_statement,
request_status
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
t1.request_status,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date));
}
sample data:
Accepted answer gives below output with max(last_updated_date) of the segment.
1097661 Submitted o2k lad 30-NOV-15 01-DEC-16 62 CLOB DATA
above posted query GIVES below output which is sr_id of segments with other updated dates.
1097621 o2k lad Submitted
1097625 o2k lad Submitted
1097627 o2k lad Submitted
1097632 o2k lad Submitted
1097633 o2k lad Submitted
1097658 o2k lad Pending
1097640 o2k lad Submitted
1097644 o2k lad Submitted
1097646 o2k lad Submitted
expected output:
sr_id status segment_name updated_date sql_statement other_sr_id
1097661 Submitted o2k lad 30-NOV-15 CLOB DATA 1097618,1097621,1097625,1097627,1097632,1097633,1097658,1097640,1097644,1097646
combine the two queries so that last column contains all old sr_id.
A fairly simple option is to modify your current query to add an analytic function that finds the maximum date for each ID, as something like:
..., max(mst.last_updated_date) over (partition by id) as max_updated_date
A quick demo of the general idea:
with cte (id, last_updated_date, sql_statement) as (
select 1, date '2017-01-01', to_clob('stmt 1') from dual
union all select 1, date '2017-01-02', to_clob('stmt 2') from dual
union all select 1, date '2017-01-03', to_clob('stmt 3') from dual
union all select 2, date '2017-01-02', to_clob('stmt 4') from dual
)
select id, last_updated_date, sql_statement
from (
select id, last_updated_date, sql_statement,
max(last_updated_date) over (partition by id) as max_updated_date
from cte
)
where last_updated_date = max_updated_date;
ID LAST_UPDAT SQL_STATEMENT
---------- ---------- --------------------------------------------------------------------------------
1 2017-01-03 stmt 3
2 2017-01-02 stmt 4
You could use a row_number() or rank() or dense_rank() to identify the row with the earliest date and filter on that instead, but the general idea is the same.
However, your current query isn't very clear (or valid before 12c) to begin with. Rather than trying to guess how to include such a function and filter, it's probably simpler to start again from your base tables, though this makes a lot of assumptions about what you're doing, and may ignore some things - like left and right joins - that may or may not actually be needed.
Making up some data via CTEs:
with test_1 (sr_id, last_updated_date) as (
select 1108197, timestamp '2017-02-05 23:56:59' from dual
union all select 1108217, timestamp '2017-02-14 00:37:39' from dual
union all select 1108218, timestamp '2017-02-14 01:39:50' from dual
union all select 1108220, timestamp '2017-02-14 03:39:07' from dual
),
test_2 (sm_id, segmentation_name, sql_statement) as (
select 958, 'test_not_in', to_clob('select * from dual') from dual
),
test_3 (q_id, sr_id, answer) as (
select 41, 1108197, 958 from dual
union all select 42, 1108217, 958 from dual
union all select 43, 1108218, 958 from dual
union all select 44, 1108220, 958 from dual
),
test_4 (q_id, field_id) as (
select 41, 'LM_LRE_Q6' from dual
union all select 42, 'LM_LRE_Q6' from dual
union all select 43, 'LM_LRE_Q6' from dual
union all select 44, 'LM_LRE_Q6' from dual
)
then this gets the same output you showed in the question:
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
to_char(t1.last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
t2.sql_statement
from test_4 t4
join test_3 t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.answer
join test_1 t1 on t1.sr_id = t3.sr_id;
SR_ID SM_ID SEGMENTATIO LAST_UPDATED_DATE SQL_STATEMENT
---------- ----- ----------- ----------------------------- --------------------------------------------------------------------------------
1108197 958 test_not_in 05-feb-2017 23:56:59 select * from dual
1108217 958 test_not_in 14-feb-2017 00:37:39 select * from dual
1108218 958 test_not_in 14-feb-2017 01:39:50 select * from dual
1108220 958 test_not_in 14-feb-2017 03:39:07 select * from dual
On the wild assumption that is close to right, you could find the row with the latest date for each sm_id
with something like this:
select sr_id,
sm_id,
segmentation_name,
to_char(last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
sql_statement
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join test_3 t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.answer
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date;
SR_ID SM_ID SEGMENTATIO LAST_UPDATED_DATE SQL_STATEMENT
---------- ----- ----------- ----------------------------- --------------------------------------------------------------------------------
1108220 958 test_not_in 14-feb-2017 03:39:07 select * from dual
You will need to adapt that to deal with any other restrictions or requirements that weren't clear (including your left/right outer joins, for instance).
I've deliberately ignored the the subquery you were doing to split the 'answer' into multiple values. It's possible you have something horrible like a delimited list of IDs in there, which is a data model problem. If that is the case then you'll still need to extract the individual sm_id
values; something like:
with answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '\d+', 1, level) is not null
)
select sr_id,
sm_id,
segmentation_name,
to_char(last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
sql_statement
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date;
Based on the actual contents of test3
you added, your regular expression isn't doing quite what you need. With the pattern you're using it finds 14 numeric values, i.e. any numbers:
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '\d+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------
1009330 1108239 972
1009330 1108240 972
1009330 1108240 827
1009330 1108246 976
1009330 1108246 24
1009330 1108246 941
1009330 1108247 941
1009330 1108247 2016
1009330 1108247 787
1009330 1108247 28
1009330 1108247 976
1009330 1108247 24
1009330 1108251 981
1009330 1108251 100579729
It appears you only want the bits between the ^ delimiters and the ~ markers. A common way to split a delimited string is:
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '[^^]+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------------------------------------
1009330 1108239 972~test_emea
1009330 1108240 972~test_emea
1009330 1108240 827~test_with_region_country
1009330 1108246 976~feb_24
1009330 1108246 941~Test_regionwithcountry
1009330 1108247 941~Test_regionwithcountry_2016
1009330 1108247 787~Test_Request_28
1009330 1108247 976~feb_24
1009330 1108251 981~MSE100579729 testing.
but you then need to get the first part of that, e.g. borrowing your original pattern (others are available!):
column sm_id format a10
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------
1009330 1108239 972
1009330 1108240 972
1009330 1108240 827
1009330 1108246 976
1009330 1108246 941
1009330 1108247 941
1009330 1108247 787
1009330 1108247 976
1009330 1108251 981
Notice that extra regexp_substr()
is only in the select list, not the connect-by clause; and that the extract sm_id
is still a string. If test_2.sm_id
is a number then add a to_number()
call around the pair of substrings in that select list too.