Search code examples
regexbashgrepcygwin

Receiving some inaccurate (or no) grep matches for numbers in garbled text


I'm working on a coding challenge that requires my program (writing it in bash on Cygwin) to search for numbers that match different representations of valid IPv4 addresses in a garbled text. I've worked out a lot of my bugs, but I've come across a problem when dealing with some numbers. When I grep for a specific number, I'm receiving results that are within a larger number, which I don't want.

Say I have the following test text:
Dotted decimal89.229.130.225with no leading zero.
Dotted hexadecimal0xc0.0x0.0x02.0xebEach octet is individually converted to hexadecimal form.
Dotted octal0300.0000.0002.0353Each octet is individually converted into octal.
Dotted binary11000000.00000000.00000010.11101011Each octet is individually converted into binary.

10101010101010101010101010101010Binary11000000000000000000001011101011
030135300000Octal030000001353
Hexadecimal0xC00002EBConcatenation of the octets from the dotted hexadecimal.
Decimal3221226219The 32-bit number expressed in decimal.
1.1.1.1.1

I'm trying to search for a 12-digit number whose first digit is zero, second digit is [1-7], and third through 12th digits are [0-7]. I tried this grep originally:

grep -o '0[1-7][0-7]\{10\}'

But this returned:
010101010101 <- unwanted
010101010101 <- unwanted
030135300000 <- desired output
030000001353 <- desired output

Because I don't want to use a number that is within another number, I tried to get matches that have anything but a number before or after:

grep -o '[^0-9]0[1-7][0-7]\{10\}[^0-9]'

But this is returning nothing!

I've tried the following, also, from other related posts:

grep -Eo '(^|[^0-9])0[1-7][0-7]\{10\}($|[^0-9])'
grep -o '[^0-9]?0[1-7][0-7]\{10\}[^0-9]?'
grep -P '(?<!\d)0[1-7][0-7]\{10\}(?!\d)'

None has worked. Nothing comes out.

I don't understand what I'm doing wrong. Obviously something's wrong with my regex/reasoning/text, but I don't know what it is! Any help would be very appreciated.


Solution

  • Your last pattern looks good but don't escape the quantifier and use grep with options -Po

    • -P --perl-regexp   Interpret PATTERN as a Perl regular expression.
    • -o --only-matching   Show only the part of a matching line that matches PATTERN

    The negative lookarounds won't allow a match inside digits.

    grep -Po '(?<!\d)0[1-7][0-7]{10}(?!\d)'
    

    See pcre demo at regex101