I have a bunch of tab separated text files like the following one
"gene_id" "Pattern1" "Pattern2" "Pattern3" "Pattern4" "Pattern5" "MAP" "PPDE"
"ENSG00000119771.13" 3.11528786599051e-18 2.52650109640992e-13 6.25109524320237e-09 0.345846257420197 0.654153736328455 "Pattern5" 1
"ENSG00000123700.4" 1.75016991626305e-36 3.98804090894939e-19 0.63423772228367 3.8159144080782e-21 0.36576227771633 "Pattern3" 1
"ENSG00000128567.15" 1.10722918612618e-23 7.62691311068806e-07 5.77031364194955e-06 5.13675840911147e-21 0.999993466995047 "Pattern5" 1
"ENSG00000130182.6" 9.75717082221716e-22 1.27675651077242e-12 0.469972541094369 1.13677117238758e-12 0.530027458903217 "Pattern5" 1
"ENSG00000131914.9" 3.1627489688037e-41 1.00274706758683e-22 0.0578584524816503 6.98718794692175e-22 0.94214154751835 "Pattern5" 1
Now I want to join them into one file in the so that I get
"gene_id" "Pattern5" "Pattern5" "Pattern5" "Pattern5" "Pattern5"
where each Pattern5
column comes from one file.
I tried some stuff with
cut -f 6 <file>
and
paste <file1> <file2> ...
but I couldn't combine it correctly.
Thanks for you help!
UPDATE: I try to give you a testable example as input here:
<file1>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 1 2 3 4 5
ENSG00000123700 1 2 3 4 5
ENSG00000128567 1 2 3 4 5
ENSG00000130182 1 2 3 4 5
ENSG00000131914 1 2 3 4 5
<file2>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 6 7 8 9 10
ENSG00000123700 6 7 8 9 10
ENSG00000128567 6 7 8 9 10
ENSG00000130182 6 7 8 9 10
ENSG00000131914 6 7 8 9 10
<file3>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 11 12 13 14 15
ENSG00000123700 11 12 13 14 15
ENSG00000128567 11 12 13 14 15
ENSG00000130182 11 12 13 14 15
ENSG00000131914 11 12 13 14 15
and the desired output would be
gene_id Pattern5_file1 Pattern5_file2 Pattern5_file3
ENSG00000119771 5 10 15
ENSG00000123700 5 10 15
ENSG00000128567 5 10 15
ENSG00000130182 5 10 15
ENSG00000131914 5 10 15
UPDATE2: I tried the approach of Ed Morton:
awk '
BEGIN { FS=OFS="\t" } FNR==1{ARGIND++}
{ genes[$1]; val[$1,ARGIND] = $5 }
END {
for (gene in genes) {
printf "%s%s", gene, OFS
for (file=1; file<=ARGIND; file++) {
printf "%s%s", val[gene,file], (file<ARGIND?OFS:ORS)
}
}
} ' $files
but the output is not in the right format:
ENSG00000128567 4 9 14
ENSG00000130182 4 9 14
ENSG00000119771 4 9 14
gene_id Pattern4 Pattern4 Pattern4
ENSG00000131914 4 9 14
ENSG00000123700 4 9 14
try this one
#!/bin/bash
paste file1 file2 file3 | awk -v patternIdx=6 '
function printPattern(idx, isFirstLine) {
for (i = 1; i <= NF; ++i) {
if (i == 1)
printf "%s ", $i;
else if (isFirstLine && i % patternIdx == 0)
printf "%s_file%d ", $i, i / patternIdx;
else if (i % patternIdx == 0)
printf "%d ", $i;
}
printf "\n"
}
{
if (NR == 1)
printPattern(patternIdx, 1);
else
printPattern(patternIdx, 0);
}'
patternIdx is the column index of Pattern5