Search code examples
linuxbashawkseddata-manipulation

how to use sed or awk to add a space before negative or positive decimal number in linux


I have a file that looks like:

GENERSID1RSID2VALUE
ENSG00000242220rs2826052rs28260520.20961262553802
ENSG00000242220rs2826052rs798932040.00583452893352463
ENSG00000242220rs2826052rs117256228-0.003012912482066

I want to add space between each value so it should look like:

GENE RSID1 RSID2 VALUE
ENSG00000242220 rs2826052 rs2826052 0.20961262553802
ENSG00000242220 rs2826052 rs79893204 0.00583452893352463
ENSG00000242220 rs2826052 rs117256228 -0.003012912482066

I used this command for sed and I am able to do this much:

sed "s/rs/ &/g" Model_training_chr21_covariances.txt > Model_training_chr21_covariances1.txt

sed "s/-0/ &/" Model_training_chr21_covariances1.txt > Model_training_chr21_covariances2.txt


ENSG00000242220 rs2826052 rs28260520.20961262553802
ENSG00000242220 rs2826052 rs798932040.00583452893352463
ENSG00000242220 rs2826052 rs117256228 -0.003012912482066

Basically the negative -0.003 is now separated but 0.209 and 0.0058 is not, I am only able to add space before -0. value and not 0.? Is there any way to solve this. Thank you


Solution

  • This might work for you (GNU sed):

    sed -E 's/v|rs|-?.\./ &/ig' file
    

    Turn on extended regexp by setting -E.

    Using alternation look for either a v or rs or a possible - followed by a character followed by a period and insert a space before the match globally throughout the line throughout the file.

    N.B. The i flag in the substitution command allows the LHS of the command to match either case. The v caters for the VALUE in the header as does the rs for RSID1and RSID2 by chance. Of course this is based on the data given so to be bullet proof perhaps:

    sed -E '1s/V|RS/ &/g;1!s/rs|-?.\./ &/g' file