Search code examples
shellawksedgrepcut

Regular expression to capture alphanumeric string only in shell


Trying to write the regex to capture the given alphanumeric values but its also capturing other numeric values. What should be the correct way to get the desire output?

code

grep -Eo '(\[[[:alnum:]]\)\w+' file > output
$ cat file
2022-04-29 08:45:11,754 [14] [Y23467] [546] This is a single line
2022-04-29 08:45:11,764 [15] [fpes] [547] This is a single line
2022-04-29 08:46:12,454 [143] [mwalkc] [548] This is a single line
2022-04-29 08:49:12,554 [143] [skhat2] [549] This is a single line
2022-04-29 09:40:13,852 [5] [narl12] [550] This is a single line
2022-04-29 09:45:14,754 [1426] [Y23467] [550] This is a single line

current output -

[14
[Y23467
[546
[15
[fpes
[547
[143
[mwalkc
[548
[143
[skhat2
[549
[5
[narl12
[550
[1426
[Y23467
[550

expected output -

Y23467
fpes
mwalkc
skhat2
narl12
Y23467

Solution

  • 1st solution: With your shown samples, please try following awk code. Simple explanation would be, using gsub function to substitute [ and ] in 4th field, printing 4th field after that.

    awk '{gsub(/\[|\]/,"",$4);print $4}' Input_file
    


    2nd solution: With GNU grep please try following solution.

    grep -oP '^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{1,3} \[[0-9]+\] \[\K[^]]*' Input_file
    

    Explanation: Adding detailed explanation for above regex used in GNU grep.

    ^[0-9]{4}(-[0-9]{2}){2}  ##From starting of value matching 4 digits followed by dash 2 digits combination of 2 times.
     [0-9]{2}(:[0-9]{2}){2}  ##Matching space followed by 2 digits followed by : 2 digits combination of 2 times.
    ,[0-9]{1,3}              ##Matching comma followed by digits from 1 to 3 number.
     \[[0-9]+\] \[\K         ##Matching space followed by [ digits(1 or more occurrences of digits) followed by space [ and
                             ##then using \K to forget all the previously matched values.
    [^]]*                    ##Matching everything just before 1st occurrence of ] to get actual values.