Working on assessing impact of some current databases planned retirement. It's not feasible for individual communication with users that have accessed impacted data recently due to volume.
I'm thinking that if I can do some form of fuzzy logic lookup to group like queries by user, at a minimum I can identify reoccuring queries that differ slightly because of expected constraint changes. Although far from perfect, this could help represent queries that are ran regularly to support reoccurring business functions vs pure ad-hoc.
Can anyone provide some ideas that can get me started or let me know if there are any alternate ideas to research given my goals stated above?
You can combine UTL_MATCH, DBA_HIST_SQLTEXT, and DBA_HIST_SQLSTAT to find similar query executions. If you haven't licensed AWR, or are only interested in recent queries, you can use GV$SQLSTATS instead of the DBA_HIST tables.
In addition to being complicated, you will need to adjust some of the literals in the below query based on trial and error. Currently, it only looks at the top 10 most executed query per user, and only finds the top 5 most related queries where the similarity score is greater than or equal to 60%.
--Common queries and the top 5 most-closely related queries.
with statements as
(
--All relevant SQL statements
select
sqlstats.parsing_schema_name,
sqlstats.total_executions,
sqltext.sql_id,
--Convert CLOB to VARCHAR for UTL_MATCH.
--Won't matter, since we're only interseted in fuzzy matches anyway.
to_char(substr(sqltext.sql_text, 1, 1000)) sql_text,
sqltext.command_type
from
(
--All queries in AWR.
select sql_id, sql_text, command_type
from dba_hist_sqltext
) sqltext
join
(
--Statistics for all queries in AWR.
select sql_id, parsing_schema_name, sum(executions_delta) total_executions
from dba_hist_sqlstat
group by sql_id, parsing_schema_name
) sqlstats
on sqltext.sql_id = sqlstats.sql_id
order by parsing_schema_name, total_executions desc
)
--Top N most similar queries.
select *
from
(
--Ranked similarity.
select
similarity.*,
row_number() over (partition by sql_id1 order by similarity desc) top_similarity
from
(
--Similarity between SQL statements for the Top N SQL and other SQL run by the same user.
select
top_n.parsing_schema_name, top_n.sql_id sql_id1, top_n.sql_text sql_text1, top_n.total_executions,
statements.sql_id sql_id2, statements.sql_text sql_text2,
utl_match.edit_distance_similarity(top_n.sql_text, statements.sql_text) similarity
from
(
--Top N most executed queries.
select *
from
(
--Most executed queries per user.
select
statements.*,
row_number () over (partition by parsing_schema_name order by total_executions desc) top_n
from statements
order by parsing_schema_name, total_executions desc
)
where top_n <= 10
) top_n
join statements
on top_n.parsing_schema_name = statements.parsing_schema_name
and top_n.command_type = statements.command_type
and top_n.sql_id <> statements.sql_id
order by top_n.sql_id, similarity desc, statements.sql_id
) similarity
) ranked_similarity
where top_similarity <= 5
and similarity >= 60
order by parsing_schema_name, sql_id1, top_similarity;