Search code examples
sqloraclecountcellsql-like

SQL using Count, with same "Like" multiple times in same cell


I'm trying to get a count on how many times BNxxxx has been commented in the comments cell. So far, I can make each cell be counted once, but there may be multiple comments in a cell containing BNxxxx.

For example, this:

------- 
BN0012
------- 
BN0012
------- 
BN0012
BN0123
-------

should show an output of BN0012 3 times and BN0123 once. Instead, I get BN0012 3 times only.

Here's my code:

select COMMENTS, count(*) as TOTAL 
from NOTE 
Where COMMENTS like '%BN%' AND CREATE_DATE between '01/1/2015' AND '11/03/2015'
group by COMMENTS
order by Total desc; 

Any ideas?

edit

My code now looks like

select BRIDGE_NO, count(*)
from IACD_ASSET b join 
IACD_NOTE c
on c.COMMENTS like concat(concat('BN',b.BRIDGE_NO),'%')
Where c.CREATE_DATE between '01/1/2015' AND '11/03/2015' AND length(b.BRIDGE_NO) > 1 
group by b.BRIDGE_NO
order by count(*);

Problem with this is the BN44 is the same as BN4455 .. have tried concat(concat('BN',b.BRIDGE_NO),'_') comes back with nothing , any ideas how i can get exact likes


Solution

  • I'm going to assume that your COMMENTS table has a primary key column (such as comment_id) or at least that comments isn't a CLOB. If it is a CLOB then you're not going to be able to use GROUP BY on that column.

    You can accomplish this as follows without even a lookup table of BN.... values. No guarantees as to the performance:

    WITH d1 AS (
        SELECT 1 AS comment_id, 'BN0123 is a terrible thing BN0121 also BN0000' AS comments
             , date'2015-01-03' AS create_date
          FROM dual
         UNION ALL
        SELECT 2 AS comment_id, 'BN0125 is a terrible thing BN0120 also BN1000' AS comments
             , date'2015-02-03' AS create_date
          FROM dual
    )
    SELECT comment_id, comments, COUNT(*) AS total FROM (
        SELECT comment_id, comments, TRIM(REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i')) AS bn
          FROM d1
         WHERE create_date >= date'2015-01-01'
           AND create_date < date'2015-11-04'
       CONNECT BY REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i') IS NOT NULL
           AND PRIOR comment_id = comment_id
           AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL
    ) GROUP BY comment_id, comments;
    

    Note that I corrected your filter:

    CREATE_DATE between '01/1/2015' AND '11/03/2015'
    

    First, you should be using ANSI date literals (e.g., date'2015-01-01'); second, using BETWEEN for dates is often a bad idea as Oracle DATE values contain a time portion. So this should be rewritten as:

        create_date >= date'2015-01-01'
    AND create_date < date'2015-11-04'
    

    Note that the later date is November 4, to make sure we capture all possible comments that were made on November 3.

    If you want to see the matched comments without aggregating the counts, then do the following (taking out the outer query, basically):

    WITH d1 AS (
        SELECT 1 AS comment_id, 'BN0123 is a terrible thing BN0121 also BN0000' AS comments
             , date'2015-01-03' AS create_date
          FROM dual
         UNION ALL
        SELECT 2 AS comment_id, 'BN0125 is a terrible thing BN0120 also BN1000' AS comments
             , date'2015-02-03' AS create_date
          FROM dual
    )
    SELECT comment_id, comments, TRIM(REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i')) AS bn
      FROM d1
     WHERE create_date >= date'2015-01-01'
       AND create_date < date'2015-11-04'
    CONNECT BY REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i') IS NOT NULL
       AND PRIOR comment_id = comment_id
       AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL;
    

    Given the edits to your question, I think you want something like the following:

    SELECT b.bridge_no, COUNT(*) AS comment_cnt
      FROM iacd_asset b INNER JOIN iacd_note c
        ON REGEXP_LIKE(c.comments, '(^|\W)BN' || b.bridge_no || '(\W|$)', 'i')
     WHERE c.create_dt >= date'2015-01-01'
       AND c.create_dt < date'2015-03-12' -- It just struck me that your dates are dd/mm/yyyy
       AND length(b.bridge_no) > 1
     GROUP BY b.bridge_no
     ORDER BY comment_cnt;
    

    Note that I am using \W in the regex above instead of \s as I did earlier to make sure that it captures things like BN1234/BN6547.