Search code examples
sqlpresto

Count matching value in two columns in SQL


I have this table where the column topic contains array of labels. How can we get number of times same value present in both topics and label column.

|     topics   |  label |
| ------------ | ------ |
| [A, B, C, D] |    A   |
|     [D]      |    G   |
|    [C, D]    |    D   |
|    [G, A]    |    C   |
|    [A, C]    |    A   |
|    [B, G]    |    A   |

In above example A is present 2 times (row 1 and 5). D is present 1s (row 3).

|  label   | count |
| -------- | ------|
|    A     |   2   |
|    D     |   1   |

I tried using

any_match(
  topics,
  e -> e > A
)

But it doesn't allow to directly pass the column name.


Solution

  • One option is to form delimited list of labels and calculate length of the list with and without particular label where difference of the two is number of occurances. Left all the columns selected for clarification.

    Select  TOPICS,  
            Replace( Replace(Replace(TOPICS, '[', ','), ']', ','), ' ', '') "LABEL_LIST",
            --
            LABEL, 
          ( ( Length( Replace( Replace(Replace(TOPICS, '[', ','), ']', ','), ' ', '') ) -
            Length( REPLACE(
                              Replace( Replace(Replace(TOPICS, '[', ','), ']', ','), ' ', ''), 
                              ',' || LABEL || ',', '')
                             )  
           ) / Length(',' || LABEL || ',')
          ) + 1 "LABEL_COUNT"
            
    From    tbl
    
    /*      R e s u l t :
    TOPICS       LABEL_LIST   LABEL LABEL_COUNT
    ------------ ------------ ----- -----------
    [A, B, C, D] ,A,B,C,D,    A               2
    [D]          ,D,          G               1
    [C, D]       ,C,D,        D               2
    [G, A]       ,G,A,        C               1
    [A, C]       ,A,C,        A               2
    [B, G]       ,B,G,        A               1    */