I have a file that consists of many columns which look like this:
0/0:7,0:7:21:0,21,245 0/0:9,0:9:27:0,27,339 0/0:13,0:13:39:0,39,524
I want to remove everything within each column so that the output looks like this:
0/0 0/0 0/0
There are far too many columns to manually apply a solution like awk where you have to type $1, $2 for each column.
I have tried a number of solutions in R, none of which gave the results I am looking for. They all split the column instead of just retaining the first entry. This is a vcf file, and I have tried using vcf2tsv, but I cannot get the dependencies to work.
For example I tried this code:
test<-sub('(:<=\\:).*$', '', x, perl=TRUE)
Which gave me the following:
"c(\"0/0:8,0:8:24:0,24,305\", \"0/0:6,0:6:18:0,18,242\", \"0/0:5,0:5:15:0,15,200\",
Clearly I do not understand the code. Any help is appreciated.
With the sample input in the question you can use
sed 's#:[^ ]*##g' inputfile
to get the output
0/0 0/0 0/0
The sed
script will replace everything starting with a colon (:
) followed by any characters except space ([^ ]
) with an empty string for all occurrences (g
). This means it will do this in all columns separated by a space.