Search code examples
unixawksedcut

Keep part of string after hyphen for specific column


For column 2 in my input files I want to keep the part after the hyphen. I have tried a cut command, but don't know how to apply this to the second column only:

echo TCCCATATGGTCTAGCGGTTAGGATTCCT   1-230823 | cut -d - -f 2
230823

Input:

TCCCATATGGTCTAGCGGTTAGGATTCCT   1-230823
GCATTGGTGGTTCAGTGGTAGAATTCTC    2-172580

Out:

TCCCATATGGTCTAGCGGTTAGGATTCCT   230823
GCATTGGTGGTTCAGTGGTAGAATTCTC    172580

Solution

  • You can use the following sed command:

    sed -E 's/^([^[:space:]]+[[:blank:]]+)[0-9]+-/\1/' file
    

    See the online sed demo:

    s='TCCCATATGGTCTAGCGGTTAGGATTCCT   1-230823
    GCATTGGTGGTTCAGTGGTAGAATTCTC    2-172580'
    sed -E 's/^([^[:space:]]+[[:blank:]]+)[0-9]+-/\1/' <<< "$s"
    # TCCCATATGGTCTAGCGGTTAGGATTCCT   230823
    # GCATTGGTGGTTCAGTGGTAGAATTCTC    172580
    

    The POSIX ERE (-E option enables this syntax) regex matches

    • ^ - start of string
    • ([^[:space:]]+[[:blank:]]+) - Group 1 (\1 refers to this group value): one or more non-whitespace chars followed with one or more horizontal whitespace chars
    • [0-9]+- - 1 or more digits and a -.