Search code examples
bashbioinformaticsfastagenomevcf-variant-call-format

change charters in a string based on vcf table data


I have a long string file (string.txt) (abcdefghijklmnop) and a vcf table (file.vcf) which lools like that

position 2 4 6 10 n...
name1 a b c d
name2 x y z a
namen...

the table also contain "mis" and "het" and in this case the character should not be replaced

I want to change the characters in the specific location and store all the strings in a new file that will look like this

>name1
aacbecghidklmnop
>name2
axcyezghiaklmnop

is there a way to do it in a bash loop ?


Solution

  • Would you please try the following:

    mapfile -t string < <(fold -w1 "string.txt")
    # set string to an array of single characters: ("a" "b" "c" "d" ..)
    
    while read -ra ary; do
        if [[ ${ary[0]} = "position" ]]; then
            # 1st line of file.vcf
            declare -a pos=("${ary[@]:1}")
            # now the array pos holds: (2 4 6 10 ..)
        else
            # 2nd line of file.vcf and after
            declare -a new=("${string[@]}")
            # make a copy of string to modify
            for ((i=0; i<${#pos[@]}; i++ )); do
                repl="${ary[$i+1]}"    # replacement
                if [[ $repl != "mis" && $repl != "het" ]]; then
                    new[${pos[$i]}-1]="$repl"
                    # modify the position with the replacement
                fi
            done
            echo ">${ary[0]}"
            (IFS=""; echo "${new[*]}")
            # print the modified array as a concatenated string
        fi
    done < "file.vcf"
    

    string.txt:

    abcdefghijklmnop
    

    file.vcf:

    position 2 4 6 10
    name1 a b c d
    name2 x y z a
    name3 i mis k l
    

    Output:

    >name1
    aacbecghidklmnop
    >name2
    axcyezghiaklmnop
    >name3
    aicdekghilklmnop
    

    I have tried to embed explanations as comments in the script above, but if you still have a question, please feel free to ask.

    Hope this helps.