I'm looking for a way to merge 4 lines of dna probing results into one line.
The problem here is:
I don't want to append the lines. But associating them
The 4 lines of dna probing:
A----A----------A----A-A--AAAA-
-CC----CCCC-C-----CCC-C-------C
------G----G--G--G------G------
---TT--------T-T---------T-----
I need these to be 1 line, not just appended but intermixed without the dashes.
First characters of the result:
ACCTTAGCCCCGC...
This seem to be a kind of general problem, so the language choosed to solve this don't matter.
lines=(
A----A----------A----A-A--AAAA-
-CC----CCCC-C-----CCC-C-------C
------G----G--G--G------G------
---TT--------T-T---------T-----
)
result=""
for ((i=0;i<${#lines};i++)) ;do
chr=- c=()
for ((l=0;l<${#lines[@]};l++)) ;do
[ "${lines[l]:i:1}" != "-" ] &&
chr="${lines[l]:i:1}" &&
c+=($l)
done
[ ${#c[@]} -eq 0 ] && printf 'Char #%d not replaced.\n' $i
[ ${#c[@]} -gt 1 ] && c="${c[*]}" && chr="*" &&
printf "Conflict at char #%d (lines: %s).\n" $i "${c// /, }"
result+=$chr
done
echo $result
With provided input, there is no conflict and all characters is replaced. So the output is:
ACCTTAGCCCCGCTGTAGCCCACAGTAAAAC
Note: Question stand for 4 different files, so lines=
syntax could be:
lines=($(cat file1 file2 file3 file4))
But with a wrong input:
lines=(
A----A---A-----A-----A-A--AAAA-
-CC----CCCC-C-----CCC-C-------C
------G----G---G-G------G------
---TT--------T-T---------T-----
)
output could be:
Conflict at char #9 (lines: 0, 1).
Char #14 not replaced.
Conflict at char #15 (lines: 0, 2, 3).
Char #16 not replaced.
and
echo $result
ACCTTAGCC*CGCT-*-GCCCACAGTAAAAC
But if input are not to be verified, this little perl filter could do the job:
(Thanks @jm666 for }{
syntax)
perl -nlE 'y+-+\0+;$,|=$_}{say$,' <(cat file1 file2 file3 file4)
where
-n process all lines without output
-l whipe leading cariage return at end of lines
y+lhs+rhs+ replace (translate) chars from 'lhs' to 'rhs'
\0 is the *null* character, binary 0.
$, is a variable
|= binary or, between himself and current line ($_)
}{ at END, once all lines processed