Search code examples
regexlinuxwindowsgrepcygwin

grep regular expression to search a number sequence


I have a large text file which I would like to grep-search. Structure of the file is like this:

8071656799 4661447177 0355654334 3019852932 8351070080 3427747396 : 3099000001
8711668395 3649821348 9956324354 5011010810 9136023566 9366447433 : 3099000002
5082147211 3084342012 9526906615 7367215108 0922482666 6485161555 : 3099000003
4029562459 5180764444 6007631229 0296033611 6410243961 1599676529 : 3099000004
2029562935 7403306551 4667331755 4708680737 0948271458 0585681992 : 3099000005
3980586858 2774838233 2196908474 1817405080 5501649035 3043116116 : 3099000006
4821697167 9339115830 6953440258 6707173876 7188037671 5127476767 : 3099000007
0341392607 4082292483 7807211229 1753819242 4269141779 6567687980 : 3099000008

I would like to find certain sequence of numbers while ignoring spaces, colons, ends of lines and last 10 digits of each line. For example 8034277473968711 wil be found on the first two rows:

80 3427747396 : 3099000001 8711

Please, can you help me with the grep regular expression for such task? Or any other way to resolve this problem. Thanks.


Solution

  • Try this:

    sed -e 's/\s//g' < sed -e 's/:[0-9]\+$//g' < inputFileName | sed -e ':a;N;$!ba;s/\n//g' | grep -o "8034277473968711"

    I tested this in an AWS Ubuntu 14.04 microInstance.

    We're running and piping it through series of seds here and closing it off with a grep -o. The -o flag prevents a huge wall of text with arbitrary highlighting since you're dealing with a large data set. It should now show only the result, while eliminating all the stuff you didn't want in it's computation.

    Replace inputFileName with your file name and the numbers in parentheses with whatever you'd like to search for (no spaces, just straight numbers).

    Good luck!