I have recently been given the opportunity to work whilst learn SQL. I have limited knowledge on SQL syntax and practices.
As one of my first task on the job, I have to look for phone numbers within a string of text describing an apartment complex. The reason being is that we charge for inserting contact information.
Although I have written a query isolating only numbers with SQLyog, I am looking to further filter my 2k+ results by isolating the instances where I have
1) 10 consecutive digits without spaces i.e. 6173274810
2) 10 consecutive digits with cluster of 3 digits, 3 digits, and 4 digits with one space in between i.e. 855 347 2501
3) 10 consecutive digits with cluster of 3 digits, 3 digits, and 4 digits with more than one space in between i.e. 622 257 2701
So far this is what I have got,
SELECT ac.complex_id, ac.comments,
LENGTH(r.number_string) AS count_of_characters, r.number_string
FROM aptcomplexes ac, manager_center_subscription mcs,
(SELECT complex_id, LENGTH(comments), comments,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
comments, '*', ''), '/',''), 'a', ''), 'b', ''), 'c', ''), 'd', ''), 'e', ''), 'f', ''), 'g', ''), 'h', ''), 'i', ''), 'j', ''), 'k', ''), 'l', ''), 'm', '')
, 'n', ''), 'o', ''), 'p', ''), 'q', ''), 'r', ''), 's', ''), 't', ''), 'u', ''), 'v', ''), 'w', ''), 'x', ''), 'y', ''), 'z', ''), '.', ''), ',', ''), "’", ''), 'A', ''), 'B', ''), 'C', ''), 'D', ''), 'E', ''),
'F', ''), 'G', ''), 'H', ''), 'I', ''), 'J', ''), 'K', ''), 'L', ''), 'M', ''), 'N', ''), 'O', ''), 'P', ''), 'Q', ''), 'R', ''), 'S', ''), 'T', ''), 'U', ''), 'V', ''), 'W', ''), "X", ''), 'Y', '')
, 'Z', ''), '!', ''), "?", ''), '$', ''), 'Z', ''), '-', ''), '(', ''), ')', ''), "&", ''), '*', ''), ':', ''), '"', ''), ';', ''), '@', ''), '|', ''), ';', ''),"\\", ''), '+', ''), '[', ''),"]" , ''),"â" , '')
, "'", ''),"#" , ''),"=" , ''), "¦", ''), "\n", ''), "½", ''), "½", ''), "©", ''), "Ã", ''), "¿", ''), "ï", ''), "%", ''), "Â", ''), "¼", ''), "¡", ''), "³", ''), "¢", ''), "€", ''), "”", '')
, "_", '') , "‚", '') , "~", '') , "<", '') , ">", '') , "~", ''), "–", '') , "‘", ''), CHAR(10),''), CHAR(9),''), CHAR(13),''), '™',''), 'œ',''), '®','') AS number_string
FROM aptcomplexes
GROUP BY complex_id
) r
WHERE
ac.complex_id = mcs.complex_id
AND ac.complex_id = r.complex_id
AND LENGTH(r.number_string)>=10
AND mcs.subscription_status='Active'
AND mcs.product_id=1
GROUP BY ac.complex_id
ORDER BY LENGTH(r.number_string) DESC
;
I have looked throughout the site but have not had success finding what I am really looking for. I have a feeling this might have to do with "LIKE" or "REGEXP." I believe JS accounts for any type of number as \d, is there something like this for SQL because "-" accounts for all characters, correct?
I am sure there are alternative ways on going about this, but a play off of my query would be most helpful. I really want to learn! Also, a detailed explanation would be great.
Thanks for the help in advance!
This regex should meet all requirements you've listed.
where column REGEXP '[0-9]{3}[:space:]*[0-9]{3}[:space:]*[0-9]{4}'
The [0-9]
means allow a number (also \d
but mysql doesn't seem to support that); the {3}
means there must be 3 of the previous value(e.g. a number). The [:space:]
is the POSIX for space (also \s
mysql again doesn't seem to support that) the *
is for zero or more occurrences of spaces http://www.regular-expressions.info/posixbrackets.html.