Search code examples
oraclefuzzy-searchfuzzy-logicfuzzy

Need to roughly group similar query executions in Oracle that have slightly different constraints


Working on assessing impact of some current databases planned retirement. It's not feasible for individual communication with users that have accessed impacted data recently due to volume.

I'm thinking that if I can do some form of fuzzy logic lookup to group like queries by user, at a minimum I can identify reoccuring queries that differ slightly because of expected constraint changes. Although far from perfect, this could help represent queries that are ran regularly to support reoccurring business functions vs pure ad-hoc.

Can anyone provide some ideas that can get me started or let me know if there are any alternate ideas to research given my goals stated above?


Solution

  • You can combine UTL_MATCH, DBA_HIST_SQLTEXT, and DBA_HIST_SQLSTAT to find similar query executions. If you haven't licensed AWR, or are only interested in recent queries, you can use GV$SQLSTATS instead of the DBA_HIST tables.

    In addition to being complicated, you will need to adjust some of the literals in the below query based on trial and error. Currently, it only looks at the top 10 most executed query per user, and only finds the top 5 most related queries where the similarity score is greater than or equal to 60%.

    --Common queries and the top 5 most-closely related queries.
    with statements as
    (
        --All relevant SQL statements
        select
            sqlstats.parsing_schema_name,
            sqlstats.total_executions,
            sqltext.sql_id,
            --Convert CLOB to VARCHAR for UTL_MATCH.
            --Won't matter, since we're only interseted in fuzzy matches anyway.
            to_char(substr(sqltext.sql_text, 1, 1000)) sql_text,
            sqltext.command_type
        from
        (
            --All queries in AWR.
            select sql_id, sql_text, command_type
            from dba_hist_sqltext
        ) sqltext
        join
        (
            --Statistics for all queries in AWR.
            select sql_id, parsing_schema_name, sum(executions_delta) total_executions
            from dba_hist_sqlstat
            group by sql_id, parsing_schema_name
        ) sqlstats
            on sqltext.sql_id = sqlstats.sql_id
        order by parsing_schema_name, total_executions desc
    )
    --Top N most similar queries.
    select *
    from
    (
        --Ranked similarity.
        select
            similarity.*,
            row_number() over (partition by sql_id1 order by similarity desc) top_similarity
        from
        (
            --Similarity between SQL statements for the Top N SQL and other SQL run by the same user.
            select
                top_n.parsing_schema_name, top_n.sql_id sql_id1, top_n.sql_text sql_text1, top_n.total_executions,
                statements.sql_id sql_id2, statements.sql_text sql_text2,
                utl_match.edit_distance_similarity(top_n.sql_text, statements.sql_text) similarity
            from
            (
                --Top N most executed queries.
                select *
                from
                (
                    --Most executed queries per user.
                    select
                        statements.*,
                        row_number () over (partition by parsing_schema_name order by total_executions desc) top_n 
                    from statements
                    order by parsing_schema_name, total_executions desc
                )
                where top_n <= 10
            ) top_n
            join statements
                on top_n.parsing_schema_name = statements.parsing_schema_name
                and top_n.command_type = statements.command_type
                and top_n.sql_id <> statements.sql_id
            order by top_n.sql_id, similarity desc, statements.sql_id
        ) similarity
    ) ranked_similarity
    where top_similarity <= 5
        and similarity >= 60
    order by parsing_schema_name, sql_id1, top_similarity;