Search code examples
stringawksubstringextractcut

Extracting Substring from File Name


I have a list of files with the following file name format:

[some unknown amount of characters][_d][yyyymmdd][some unknown amount of characters]

I want to extract the substring that contains the date (yyyymmdd) which I know will always be proceeded by "_d". So basically I want to extract the first 8 characters after the "_d".

What is the best way to go about doing this?


Solution

  • I would use sed:

    $ echo "asdfasd_d20150616asdasd" | sed -r 's/^.*_d(.{8}).*$/\1/'
    20150616
    

    This gets a string and removes everything up to _d. Then, catches the following 8 characters and prints them back.

    • sed -r is used to be able to catch groups with just () instead of \(\).
    • ^.*_d(.{8}).*$
      • ^ beginning of line
      • .* any number of characters (even 0 of them)
      • _d literal _d you want to match
      • (.{8}) since . matches any character, .{8} matches 8 characters. With () we catch them so that they can be reused later on.
      • .*$ any number of characters up to the end of the line.
    • \1 print back the catched group.