I need to capture the starting characters from database objects with various pattern search.
I have tried using the REGEX expression ,below query i am trying and the expected data set to be.
Query 1
SELECT
owner AS schema_name,
object_name,
object_type,
REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '\1') as BEGINNING,
count(*),
round(100*ratio_to_report(count(*)) over (), 4) percentage
FROM
dba_objects
GROUP BY
owner,
object_name,
object_type,
REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '\1')
ORDER BY
percentage desc;
Expected Data Set
OBJECT_NAME BEGINNING COUNT(*) PERCENT
ABC_CUST_INFO_D ABC 20 .00010
BBC_CUST_ENTRY_F BBC 100 .030
FHS_PRDCT_STST_T A$f 194 .031
GHS_INVTR_CD_DRY A1B 493 .051
#Tableau_6_sid:15251a_4i_2a #Tableau 4000 1.5
/15722c29d_PhotoImageCameraPro /15722c29d 5000 1.6
JAVA/15722c29d_PhotoImageqeeee JAVA 5000 1.6
Unfortunately i was not able to separate the below
#Tableau_6_sid:15251a_4i_2a --> #Tableau
/15722c29d_PhotoImageCameraPro --> /15722c29d
JAVA/15722c29d_PhotoImageqeeee --> JAVA
how to resolve ?
so how to capture the last character after _ , below is an example
Query 2 - capture the last characters
select
owner,
object_name,
object_type,
regexp_substr(object_name, '[^_]*$') ENDING,
count(*) COUNT,
round(100*ratio_to_report(count(*)) over (), 4) percentage
from
dba_objects
where object_name like '%/%'
group by
owner,
object_name,
object_type,
regexp_substr(object_name, '[^_]*$')
ORDER BY
percentage desc
Expected Data Set
OBJECT_NAME BEGINNING COUNT(*) PERCENT
ABC_CUST_INFO_D D 20 .00010
BBC_CUST_ENTRY_F F 100 .030
FHS_PRDCT_STST_T T 194 .031
GHS_INVTR_CD_DRY DRY 493 .051
Teradata/jdbc/EXAcore/MPPReader MPPReader 5000 1.6
sqlj/modegen/engine/FuncInit FuncInit 8000 6.0
moon/aio/af/ext/ISCII11$Decoder ISCII11$Decoder 8700 6.1
Unfortunately i was not able to separate the below
Teradata/jdbc/EXAcore/MPPReader --> MPPReader
sqlj/modegen/engine/FuncInit --> FuncInit
moon/aio/af/ext/ISCII11$Decoder --> ISCII11$Decoder
For the regex, you could add matching /
and #
in the character class followed by matching as least as possible chars until the next /
or _
using another character class.
If the pattern is not case insensitive, you could also add a-z
^([A-Za-z0-9$/#]+?)[/_].*
If the /
and #
can only be at the start, you might also optionally match either one of them:
^([/#]?[A-Za-z0-9$]+)[/_].*
See another regex demo