I have a log file which is full of entries like the one below:
2017-07-13 11:23:43.717948 [CRIT] mod_dptools.c:1713 SRC=7479569217;7479569217;768733974848304;7479569217;300067;333;-1
I'm trying to print specific values between ;
which are numeric (always). For example, I want to print the 1st, 3rd and 5th number between ;
.
I tried this pattern:
(?=;).+?(?=;).+?.+?(?=;)
It will print the 2nd and the 3rd. Not sure how to print for example the 2nd and the 4th without also print the 3rd...
UPDATE:
Maybe I was not clear enough or the example was not in its best form. So let me add some more info to it:
2017-07-13 11:23:43.717948 [CRIT] mod_dptools.c:1713 SRC=123;1234567890;00000000;2222222;7479569217;87654321;300067;333;-1
My expected output is: 123;00000000;7479569217;300067;333;-1
That means the 1st number, then the 3rd, the 5th, the 6th, the 7th, then the 8th.
Best would be to able to select later if I need changes, like printing the 2nd, the 3rd, the 4th and the 5th entry only.
If you trust the data in your logfile and you don't want to validate your values to only contain -
and numbers, then you can just use a negated character class containing ;
(this will improve pattern efficiency) and only parenthetically wrap the values that you want.
Pattern: (Demo)
#not captured--vv------------vv
=([^;]*;)[^;]*;([^;]*;)[^;]*;([^;]*;)([^;]*;)([^;]*;)([^;]*;)(.*)
$1 $2 $3 $4 $5 $6 $7
Notice that the last capture group ($7
) uses a dot instead of a negative character class. This is so the pattern does not try to match on the next line. I assume this is an important feature because your logfile will have many lines of data in it. (if not, the final capture group can be like the others before it)
I am using *
as a zero-or-more quantifier, in case the logfile can deliver empty values between the semicolons. If the logfile always contains a number for each value, then +
can be used as a quantifier.
If you need to validate the values, Usagi's pattern is suitable.
Consolidating my capture groups like this: =([^;]*;)[^;]*;([^;]*;)[^;]*;([^;]*;[^;]*;[^;]*;[^;]*;.*)
or =([^;]*;)[^;]*;([^;]*;)[^;]*;((?:[^;]*;){4}.*)
successfully reduces the total number of capture groups and improves pattern efficiency & brevity, but makes the pattern slightly harder to update in the future. A more verbose pattern will make capture group changing a snap. It is up to you which pattern to select based on Validation, Efficiency, Brevity, and Maintainability.