I have a table file such as :
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore strand
scaffold EOG091B09QV:/path/path/Z xx a 1:8830-20153 74.3 144 0 1
scaffold EOG091B09QV:/path/path/A x a 1:8830-20153 100.0 93 0 0
scaffold EOG091B09QV:/path/path/Q x a 1:8830-20153 41.3 189 49 3
scaffold EOG091B09QV:/path/path/U x a 1:8830-20153 87.5 48 6 0
scaffold EOG091B09QV:/path/path/K x a 1:8830-20153 100.0 60 0 0
And the idea is simply to remove in the column sseqid the text after :
and get:
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore strand
scaffold EOG091B09QV xx a 1:8830-20153 74.3 144 0 1
scaffold EOG091B09QV x a 1:8830-20153 100.0 93 0 0
scaffold EOG091B09QV x a 1:8830-20153 41.3 189 49 3
scaffold EOG091B09QV x a 1:8830-20153 87.5 48 6 0
scaffold EOG091B09QV x a 1:8830-20153 100.0 60 0 0
I know that cut -f 1 -d ":"
matches_species_strand_H.m8
can work but not in column specific.
awk is a good choice to handle column based text:
awk 'sub(/:.*/,"",$2)+7' file
will do the job: "remove the :.*
from the 2nd column.