Search code examples
sqlregexsql-likephone-numbersqlyog

SQL,SQLYog - Looking for phone number pattern within column in


I have recently been given the opportunity to work whilst learn SQL. I have limited knowledge on SQL syntax and practices.

As one of my first task on the job, I have to look for phone numbers within a string of text describing an apartment complex. The reason being is that we charge for inserting contact information.

Although I have written a query isolating only numbers with SQLyog, I am looking to further filter my 2k+ results by isolating the instances where I have

1) 10 consecutive digits without spaces i.e. 6173274810

2) 10 consecutive digits with cluster of 3 digits, 3 digits, and 4 digits with one space in between i.e. 855 347 2501

3) 10 consecutive digits with cluster of 3 digits, 3 digits, and 4 digits with more than one space in between i.e. 622 257 2701

So far this is what I have got,

SELECT ac.complex_id, ac.comments,
LENGTH(r.number_string) AS count_of_characters, r.number_string
FROM aptcomplexes ac, manager_center_subscription mcs,

     (SELECT complex_id, LENGTH(comments), comments,
     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
     comments, '*', ''), '/',''), 'a', ''), 'b', ''), 'c', ''), 'd', ''), 'e', ''), 'f', ''), 'g', ''), 'h', ''), 'i', ''), 'j', ''), 'k', ''), 'l', ''), 'm', '')
     , 'n', ''), 'o', ''), 'p', ''), 'q', ''), 'r', ''), 's', ''), 't', ''), 'u', ''), 'v', ''), 'w', ''), 'x', ''), 'y', ''), 'z', ''), '.', ''), ',', ''), "’", ''), 'A', ''), 'B', ''), 'C', ''), 'D', ''), 'E', ''),
     'F', ''), 'G', ''), 'H', ''), 'I', ''), 'J', ''), 'K', ''), 'L', ''), 'M', ''), 'N', ''), 'O', ''), 'P', ''), 'Q', ''), 'R', ''), 'S', ''), 'T', ''), 'U', ''), 'V', ''), 'W', ''), "X", ''), 'Y', '')
     , 'Z', ''), '!', ''), "?", ''), '$', ''), 'Z', ''), '-', ''), '(', ''), ')', ''), "&", ''), '*', ''), ':', ''), '"', ''), ';', ''), '@', ''), '|', ''), ';', ''),"\\", ''), '+', ''), '[', ''),"]" , ''),"â" , '')
     , "'", ''),"#" , ''),"=" , ''), "¦", ''), "\n", ''), "½", ''), "½", ''), "©", ''), "Ã", ''), "¿", ''), "ï", ''), "%", ''), "Â", ''), "¼", ''), "¡", ''), "³", ''), "¢", ''), "€", ''), "”", '')
     , "_", '') , "‚", '') , "~", '') , "<", '') , ">", '') , "~", ''), "–", '') , "‘", ''), CHAR(10),''), CHAR(9),''), CHAR(13),''), '™',''), 'œ',''), '®','') AS number_string
     FROM aptcomplexes
     GROUP BY complex_id
     ) r

WHERE
ac.complex_id = mcs.complex_id
AND ac.complex_id = r.complex_id
AND LENGTH(r.number_string)>=10
AND mcs.subscription_status='Active'
AND mcs.product_id=1
GROUP BY ac.complex_id
ORDER BY LENGTH(r.number_string) DESC
;

I have looked throughout the site but have not had success finding what I am really looking for. I have a feeling this might have to do with "LIKE" or "REGEXP." I believe JS accounts for any type of number as \d, is there something like this for SQL because "-" accounts for all characters, correct?

I am sure there are alternative ways on going about this, but a play off of my query would be most helpful. I really want to learn! Also, a detailed explanation would be great.

Thanks for the help in advance!


Solution

  • This regex should meet all requirements you've listed.

    where column REGEXP '[0-9]{3}[:space:]*[0-9]{3}[:space:]*[0-9]{4}'
    

    The [0-9] means allow a number (also \d but mysql doesn't seem to support that); the {3} means there must be 3 of the previous value(e.g. a number). The [:space:] is the POSIX for space (also \s mysql again doesn't seem to support that) the * is for zero or more occurrences of spaces http://www.regular-expressions.info/posixbrackets.html.