Search code examples
regexposix

Regex POSIX - How can i find if the start of a line contains a word from a word that appears later in line


I have a UNIX passwd file and i need to find using egrep if the first 7 characters from GECOS are inside the username. I want to check if the username (jkennedy) contains the word "kennedy" from the GECOS.

I was planning to use back-references but the username is before the gecos so i don't know how to implement it.

For example the passwd file contains this line:

jkennedy:x:2473:1067:kennedy john:/root:/bin/bash


Solution

  • As per my original comment, the regex below works for me.

    See it in use here - note this regex differs slightly as it's more used for display purposes. The regex below is the POSIX version of this and removes non-capture groups and the unneeded capture group around the backreference.

    ^[^:]*([^:]{7})([^:]*:){4}\1.*$
    
    • ^ assert position at the start of the line
    • [^:]* match any character except : any number of times
    • ([^:]{7}) capture exactly seven of any character except :
    • ([^:]*:){4} match the following exactly four times
      • [^:]*: match any character except : any number of times, followed by : literally
    • \1 match the backreference; matches what was previously matched by the first capture gorup
    • .* match any character (except newline characters) any number of times
    • $ assert position at the end of the line