Search code examples
bashawkcutcat

Extract values after the 2nd occurrence of _ (underscore) and before dot (.) in bash


I have a file with some lines written in it in a temp directory. My goal is to extract the value between the 2nd underscore and the dot (.). For example,

Here is a sample of the contents of filesample.txt:

--rwxr-x---                    235 2016-08-24 05:13 File_Name_2696553.txt
--rwxr-x---                   1274 2016-09-14 04:44 File_Name_2852659.xls
--rwxr-x---                   1802 2016-09-14 05:04 File_Name_2852992.pdf

What I have done is the following:

cat ${tmp}filesample.txt | cut -b64- | awk -F"." '{ print $1 }'

This gives me the desired output. But, I think a better solution for this would be to have code that would look between the second underscore and the dot.

This way if the 7 digit number at the end of the contents of each line changes to 8 or more, I don't have to come back to my script and adjust it since cut -b64- is looking at the 64th position byte.

This is probably a basic question, I'm new to bash scripting.


Solution

  • You can use awk:

    awk '{split($NF, a, "[_.]"); print a[3]}' file
    

    Test:

    $ awk '{split($NF, a, "[_.]"); print a[3]}' file
    2696553
    2852659
    2852992