I want to convert this string:
PLK4(-110), RAB6B(-38), PSD2(-71), DMC1(-181), GRIN2A(-92), PTGFRN(-62,-51), KIAA0040(-307), NFIB(-552,-39), CHST2(-508),
to a data frame like this:
PLK4
RAB6B
PSD2
DMC1
GRIN2A
PTGFRN
KIAA0040
NFIB
CHST2
I need to replace (-number,...) and after that convert string to a data frame. For the first part, I used sed
command:
sed 's/([0-9],)/ /g' file.txt > file2.txt
but I wasn't successful.
With GNU awk for multi-char RS
and \s
:
$ awk -v RS='[(][^)]*),\\s*' '1' file
PLK4
RAB6B
PSD2
DMC1
GRIN2A
PTGFRN
KIAA0040
NFIB
CHST2
With any awk:
$ awk -F'[(][^)]*), *' '{for (i=1; i<NF; i++) print $i}' file
PLK4
RAB6B
PSD2
DMC1
GRIN2A
PTGFRN
KIAA0040
NFIB
CHST2
or:
$ awk -v ORS= '{gsub(/[(][^)]*), */,RS)} 1' file
PLK4
RAB6B
PSD2
DMC1
GRIN2A
PTGFRN
KIAA0040
NFIB
CHST2