Search code examples
regexgrepregex-negation

Grep Match Phones Surrounded by Text


I am trying to locate all phone numbers across various files, including JSON and TXT.

Matching should be done based on whether there are 10 or 11 numeric characters (0-012-345-6789) or (012-345-6789), NOT more and NOT less. The phone numbers are often surrounded by text, but sometimes by spaces and tabs (see below examples). The phone numbers sometimes also include hyphens "-" and parentheses "()" to delineate the numbers.

abc0123456789def <- match
abc10123456789def <- match
abc10123456789def <- match
abc101234567899def <- no match (12 numbers)
abc101234567def <- no match (9 numbers)

abc 0123456789 def <- match
abc 10123456789 def <- match

abc1(012)345-6789def <- match
abc1-012-345-6789def <- match
abc(012)345-6789def <- match
abc012-345-6789def <- match
abc 1(012)345-6789 def <- match

Your help is super appreciated!


Solution

  • If I recall grep correctly then:

    grep -iP "(?:^|(?<=\D))\d?(?:\(\d{3}\)|-?\d{3})-?\d{3}-?\d{4}(?=\D|$)"
    
    • (?:^|(?<=\D)) - behind me is the start of the line or a non-digit char
    • \d? - optional leading digit
    • (?: - start non-capturing group
      • \(\d{3}\) - format equivalent to (555)
      • | - or
      • -?\d{3} - format equivalent to -555 with the hyphen being optional
    • ) - end non-capturing group
    • -?\d{3}-?\d{4} - format equivalent to -555-5555 with optional hyphens
    • (?=\D|$) - ahead of me is a non-digit char or the end of a line

    Here it is in PHP https://regex101.com/r/Gdeiq7/1