Search code examples
sqloracle-databaseregex-groupregexp-replace

capture the starting characters with many patterns


I need to capture the starting characters from database objects with various pattern search.

I have tried using the REGEX expression ,below query i am trying and the expected data set to be.

Query 1

SELECT
   owner AS schema_name, 
   object_name,        
   object_type,
   REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '\1') as BEGINNING,
   count(*),
   round(100*ratio_to_report(count(*)) over (), 4) percentage 
FROM
   dba_objects 
GROUP BY
   owner,
   object_name,
   object_type,
   REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '\1') 
ORDER BY
   percentage desc; 

Expected Data Set

OBJECT_NAME                     BEGINNING  COUNT(*) PERCENT
ABC_CUST_INFO_D                 ABC        20      .00010
BBC_CUST_ENTRY_F                BBC        100     .030
FHS_PRDCT_STST_T                A$f        194     .031
GHS_INVTR_CD_DRY                A1B        493     .051
#Tableau_6_sid:15251a_4i_2a     #Tableau   4000    1.5
/15722c29d_PhotoImageCameraPro  /15722c29d 5000    1.6
JAVA/15722c29d_PhotoImageqeeee  JAVA 5000  1.6

Unfortunately i was not able to separate the below

#Tableau_6_sid:15251a_4i_2a --> #Tableau

/15722c29d_PhotoImageCameraPro --> /15722c29d

JAVA/15722c29d_PhotoImageqeeee --> JAVA

how to resolve ?

so how to capture the last character after _ , below is an example

Query 2 - capture the last characters

 select 
   owner, 
   object_name, 
   object_type, 
   regexp_substr(object_name, '[^_]*$') ENDING, 
   count(*) COUNT, 
   round(100*ratio_to_report(count(*)) over (), 4) percentage  
from 
   dba_objects  
   where object_name like '%/%'
group by 
   owner, 
   object_name, 
   object_type, 
   regexp_substr(object_name, '[^_]*$') 
   ORDER BY 
   percentage desc

Expected Data Set

OBJECT_NAME                     BEGINNING       COUNT(*) PERCENT
ABC_CUST_INFO_D                 D               20      .00010
BBC_CUST_ENTRY_F                F               100     .030
FHS_PRDCT_STST_T                T               194     .031
GHS_INVTR_CD_DRY                DRY             493     .051
Teradata/jdbc/EXAcore/MPPReader MPPReader       5000    1.6
sqlj/modegen/engine/FuncInit    FuncInit        8000    6.0
moon/aio/af/ext/ISCII11$Decoder ISCII11$Decoder 8700    6.1

Unfortunately i was not able to separate the below

Teradata/jdbc/EXAcore/MPPReader --> MPPReader

sqlj/modegen/engine/FuncInit --> FuncInit

moon/aio/af/ext/ISCII11$Decoder --> ISCII11$Decoder


Solution

  • For the regex, you could add matching / and # in the character class followed by matching as least as possible chars until the next / or _ using another character class.

    If the pattern is not case insensitive, you could also add a-z

    ^([A-Za-z0-9$/#]+?)[/_].*
    

    Regex demo

    If the / and # can only be at the start, you might also optionally match either one of them:

    ^([/#]?[A-Za-z0-9$]+)[/_].*
    

    See another regex demo