Search code examples
textgrepfind

How to use grep to find a specific string of numbers and move that to a new test file


I am new to using linux and grep and I am looking for some direction in how to use grep. I am trying to get two specific numbers from a text file. I will need to do this for thousands of files so I believe using grep or some equivalent to be best for my mental health.

The text file I am working with looks as follows:

*Average spectrum energy:    0.00100 MeV
 Average sampled energy :    0.00100 MeV [ -0.0000%]
 K/phi    = <E*mu_tr/rho>         = 6.529719E+02 10^-12 Gy cm^2 [ 0.0008%]
 Kcol/phi = <E*mu_tr/rho>*(1-<g>) = 6.529719E+02 10^-12 Gy cm^2 [ 0.0008%]
 <g>                              =   1.0000E-15 [  0.4264%]
 1-<g>                            =     1.000000 [  0.0000%]
<mu_tr/rho> = <E*mu_tr/rho>/Eave         = 4.075530E+03   cm^2/g [ 0.0008%]
<mu_en/rho> = <E*mu_tr/rho>*(1-<g>)/Eave = 4.075530E+03   cm^2/g [ 0.0008%]
<E*mu_en/rho>                            = 4.075530E+00   MeV cm^2/g

The values I am looking to extract from this are "0.00100" and "4.075530E+00".

At the moment I am using grep -iE "Average spectrum energy|<E*mu_en/rho>" * which is allowing me to see the full lines, but I am not quite sure how to refine the search to only show me the numbers instead of just the whole line. Is this possible using grep?

As for moving the numbers into a new file, I believe the command is > newdata.txt. My question is when using this with grep can you change how it writes the data to the new text file? I am looking for the format of the numbers to be like this:

0.00100001    3.4877754595352117
0.00100367    3.4665273232204363
0.00100735    3.4453747056004884
0.00101104    3.4243696230289187
0.00101474    3.4035147003587718

Again is that possble using the grep > newdata.txt?

I really appreciate any help or direction people can give me. Thank you.


Solution

  • I'm not quite sure why it was giving the 4.075530E+03 value.

    That's because * has the special meaning of a repetition of the previous item any number of times (including zero), so the pattern <E*mu_en/rho> does not match the text <E*mu_en/rho>, but rather < any number of E mu_en/rho>, i. e. especially <mu_en/rho>. To escape this special meaning and match a literal *, prepend a backslash, i. e. <E\*mu_en/rho>.

    I am not quite sure how to refine the search to only show me the numbers instead of just the whole line. Is this possible using grep?

    It is if PCRE (grep -P) is available in the system. To only (-o) show the numbers, we can use the feature of Resetting the match start with \K. Your modified grep command is then:

    grep -hioP "(Average spectrum energy: *|<E\*mu_en/rho> *= )\K\S*" *
    

    (option -h drops the file names, pattern item \S means not a white space).

    when using this with grep can you change how it writes the data to the new text file?

    grep by itself cannot change the format of numbers (except maybe cutting digits off). If you want this, we need another tool. Now, since we need another tool, I'd consider using a tool which is capable of doing the whole job, e. g. awk:

    awk '
    /Average spectrum energy/ { printf "%.8f    ", $4 }
    /<E\*mu_en\/rho>/         { printf "%.16f\n", $3 }
    ' * >newdata.txt