Search code examples
mysqlsqlregeximpala

Impala regex to find "2", but not "255", in pipe-delimited list


I originally asked for help finding values in a pipe-delimited list where the values must include 4 or 5, but never 2 or 3. However, the accepted solution returns the value 255 by itself.

I have a data set with pipe delimited values that can be either 0,1,2,3,4 0r 255, such as this:

|    colA    |
 ____________
| 1|1|0|255  |
| 5|4|4|2    |
| 5|4|4|3    |
| 5|4|4|4    |
| 1|0|0|0    |
| 0|2|0|1|2  |
| 5|5|0|5    |
| 0|5        |
| 5|5|255|255|
| 0|3|1|2|3  |
| 5|5|5|2|3|3|
| 0|2|0|0|0|2|
| 5|255|1|1|5|
| 4|255|4    |
| 2|2|3      |
| 255|0      |
| 5          |
| 5|5|1      |

I need a query that will return rows that include a value of 4 or 5, but never 2 or 3.

|    colA    |
 ____________
| 5|4|4|3    |   
| 5|4|4|4    |
| 5|5|0|5    |
| 0|5        |
| 5|5|255|255|
| 5|255|1|1|5|
| 4|255|4    |
| 5          |
| 5|5|1      |

The closest I've come is this query:

SELECT clin.clin_sig 
FROM clinvar clin
WHERE (clin_sig NOT REGEXP '3|2[^5]'
AND clin_sig REGEXP '4|[^25]5')

But it's missing the following entries, so something is wonky:

  • 5
  • 5|255
  • 5|5|5|5|5|5|5
  • 5|0
  • 5|255|255
  • 5|5|1

I've also tried this, but it's very slow and still missing a few results:

WHERE (clin.clin_sig LIKE "%4%" OR clin.clin_sig REGEXP "^5$"  
  OR clin.clin_sig REGEXP "^5\\|" 
  OR clin.clin_sig REGEXP "\\|5$" OR clin.clin_sig REGEXP "\\|5\\|")
AND clin.clin_sig NOT REGEXP "^2$" 
AND clin.clin_sig NOT REGEXP "\\|2$"
AND clin.clin_sig NOT REGEXP "^2\\|"
AND clin.clin_sig NOT LIKE "%3%"                          

Solution

  •     clin_sig     REGEXP '^[[:<:]](4|5)[[:>:]]$'
    AND clin_sig NOT REGEXP '^[[:<:]](2|3)[[:>:]]$'
    

    Please provide more test cases.