Search code examples
bashawksed

Replacement of numbers between parentheses


I want to convert this string:

PLK4(-110), RAB6B(-38), PSD2(-71), DMC1(-181), GRIN2A(-92), PTGFRN(-62,-51), KIAA0040(-307), NFIB(-552,-39), CHST2(-508),

to a data frame like this:

PLK4
RAB6B
PSD2
DMC1
GRIN2A
PTGFRN
KIAA0040
NFIB
CHST2

I need to replace (-number,...) and after that convert string to a data frame. For the first part, I used sed command:

sed 's/([0-9],)/ /g' file.txt > file2.txt

but I wasn't successful.


Solution

  • With GNU awk for multi-char RS and \s:

    $ awk -v RS='[(][^)]*),\\s*' '1' file
    PLK4
    RAB6B
    PSD2
    DMC1
    GRIN2A
    PTGFRN
    KIAA0040
    NFIB
    CHST2
    

    With any awk:

    $ awk -F'[(][^)]*), *' '{for (i=1; i<NF; i++) print $i}' file
    PLK4
    RAB6B
    PSD2
    DMC1
    GRIN2A
    PTGFRN
    KIAA0040
    NFIB
    CHST2
    

    or:

    $ awk -v ORS= '{gsub(/[(][^)]*), */,RS)} 1' file
    PLK4
    RAB6B
    PSD2
    DMC1
    GRIN2A
    PTGFRN
    KIAA0040
    NFIB
    CHST2