Search code examples
bashdna-sequence

Bash: Merging 4 lines in 4 files into one single file


I'm looking for a way to merge 4 lines of dna probing results into one line.

The problem here is:

I don't want to append the lines. But associating them

The 4 lines of dna probing:

A----A----------A----A-A--AAAA-

-CC----CCCC-C-----CCC-C-------C

------G----G--G--G------G------

---TT--------T-T---------T-----

I need these to be 1 line, not just appended but intermixed without the dashes.

First characters of the result:

 ACCTTAGCCCCGC...

This seem to be a kind of general problem, so the language choosed to solve this don't matter.


Solution

  • For fun: one way:

    lines=(
        A----A----------A----A-A--AAAA-
        -CC----CCCC-C-----CCC-C-------C
        ------G----G--G--G------G------
        ---TT--------T-T---------T-----
    )
    
    result=""
    for ((i=0;i<${#lines};i++)) ;do
        chr=- c=()
        for ((l=0;l<${#lines[@]};l++)) ;do
            [ "${lines[l]:i:1}" != "-" ] &&
                chr="${lines[l]:i:1}" &&
                c+=($l)
          done
        [ ${#c[@]} -eq 0 ] && printf 'Char #%d not replaced.\n' $i
        [ ${#c[@]} -gt 1 ] && c="${c[*]}" && chr="*" &&
             printf "Conflict at char #%d (lines: %s).\n" $i "${c// /, }"
        result+=$chr
      done
    echo $result
    

    With provided input, there is no conflict and all characters is replaced. So the output is:

    ACCTTAGCCCCGCTGTAGCCCACAGTAAAAC
    

    Note: Question stand for 4 different files, so lines= syntax could be:

    lines=($(cat file1 file2 file3 file4))
    

    But with a wrong input:

    lines=(
        A----A---A-----A-----A-A--AAAA-
        -CC----CCCC-C-----CCC-C-------C
        ------G----G---G-G------G------
        ---TT--------T-T---------T-----
    )
    

    output could be:

    Conflict at char #9 (lines: 0, 1).
    Char #14 not replaced.
    Conflict at char #15 (lines: 0, 2, 3).
    Char #16 not replaced.
    

    and

    echo $result
    ACCTTAGCC*CGCT-*-GCCCACAGTAAAAC
    

    Small perl filter

    But if input are not to be verified, this little perl filter could do the job: (Thanks @jm666 for }{ syntax)

    perl -nlE 'y+-+\0+;$,|=$_}{say$,' <(cat file1 file2 file3 file4)
    

    where

    -n          process all lines without output
    -l          whipe leading cariage return at end of lines
    y+lhs+rhs+  replace (translate) chars from 'lhs' to 'rhs'
    \0          is the *null* character, binary 0.
    $,          is a variable
    |=          binary or, between himself and current line ($_)
    }{          at END, once all lines processed