data.file
Istanbul;J;TK;13;OK
London;C;EN;28;OK
London;K;EN;32;OK
Paris;A;FR;30;OK
Paris;B;FR;40;OK
Zurich;G;DE;99;OK
Zurich;H;DE;33;OK
Zurich;G;DE;82;OK
Expected output:
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK
First field of every row is the condition, if this field is duplicated then merge fields 2 and 4, in field 5 just use the first occurrence.
Update: another condition is that fields 2 and 4 have to be sorted and repeated data deleted, as in the case of field 2 of Zurich.
The code I have so far is that in fields 2 and 4 data has to be sorted and repetitions deleted, as for Zurich...
awk -F';' -v OFS=';' '{getline nx; j=split (nx, Ax); for (i=1;i<=j;i++) $i=$i Ax[i]}1' data.file
Which obviously is not working as expected, this is what awfully gives back....
ParisParis;AB;FRFR;3040;OKOK
LondonLondon;CK;ENEN;2832;OKOK
IstanbulZurich;JZ;TKDE;1382;OKOK
ZurichZurich;GH;DEDE;9933;OKOK
With GNU awk for sorted_in
:
$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 != prev {
if ( NR>1 ) {
prt()
}
prev = $1
delete vals
}
{
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
vals[fldNr][$fldNr]
}
}
END { prt() }
function prt( fldNr,val,sep) {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
PROCINFO["sorted_in"] = "@ind_" (fldNr==4 ? "num" : "str") "_asc"
sep = ""
for ( val in vals[fldNr] ) {
printf "%s%s", sep, val
sep = "-"
}
printf "%s", (fldNr<NF ? OFS : ORS)
}
}
$ awk -f tst.awk data.file
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK