Search code examples
sed

How to conditionally replace words with sed?


My file is in the form:

EMPLOYEE
  FIRST NAME: JOHN
  LAST NAME: DOE
  POSITION: ACCOUNT MANAGER
  
EMPLOYEE
  FIRST NAME: BIG
  LAST NAME: BOSS
  POSITION: CEO

Well, it's a bit more complex than that, but it is enough to have a solution for it.

I try to fix the casing to title case while keeping the alignment and fields names unchanged:

EMPLOYEE
  FIRST NAME: John
  LAST NAME: Doe
  POSITION: Account Manager
  
EMPLOYEE
  FIRST NAME: Big
  LAST NAME: Boss
  POSITION: CEO

I have used this so far:

sed -E '/^\s{0,}(FIRST NAME|LAST NAME|POSITION)/ { s/((^\s{0,})(FIRST NAME|LAST NAME|POSITION))/\1/; T; s/(\b[A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g; }' employees.list

But it seems not to avoid changing the casing of the field names (FIRST NAME, LAST NAME, POSITION), so these become:

EMPLOYEE
  First Name: John
  Last Name: Doe
  Position: Account Manager
  
EMPLOYEE
  First Name: Big
  Last Name: Boss
  Position: Ceo 

(did not yet go to handle content like CEO).

Is this achievable with sed? If so, how?


Solution

  • {0,}?? Just *.

    What is really hard is that you want to apply the "first uppercase rest lowercase" regex on part of the string. What I usually do, is put part of the input into hold space separated by newline, then remove it. Then I can work on the interesting part, finally grab the hold space and res-huffle the output.

    sed -E '
        /: CEO/{p;d}
        /^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)/{
            # empty s// reuses last regex
            # add a newline betweej <this>: <and this>
            s//\1\n/
            # hold current line with the newline
            h
            # Remove the first part.
            # `\s*` in regex above super nicely "catches" newline added above.
            s///
            # capitalize
            s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
            # join with a newline and hold space
            G
            # use the capitalized part with the <prefix:> part.
            s/([^\n]*)\n([^\n]*).*/\2\1/
        }
    '
    

    Outputs:

    EMPLOYEE
      FIRST NAME: John
      LAST NAME: Doe
      POSITION: Account Manager
      
    EMPLOYEE
      FIRST NAME: Big
      LAST NAME: Boss
      POSITION: CEO
    

    Overall, consider a real programming languages, more like awk or python etc.


    Actually, you can capitalize all words and then just re-uppercase the first part, but you would have to how to exclude the EMPLOYEE line. So you can just do this:

    sed -E '
        /: CEO/{p;d}
        /^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/{
            s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
            s/^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/\U\1\E\3/i
        }
    '