Data:
id | name | city | language | area_code |
---|---|---|---|---|
01 | Juan | Cali | ES | 44 |
01 | José | Cali | ES | 44 |
01 | Pedro | Cali | ES | 44 |
02 | Albert | Edinburgh | 19 | |
02 | Mark | En | 19 | |
03 | Raisa | Hellsinki | FI | 22 |
03 | Lisa | Hellsinki | ||
04 | Gian | Roma | IT | 33 |
05 | Loris | Sicilia | ||
05 | Vera | Sicilia | 31 |
The file containing this data is in next format:
01;Juan;Cali;ES;44
01;José;Cali;ES;44
01;Pedro;Cali;ES;44
02;Albert;Edinburgh;;19
02;Mark;;En;19
03;Raisa;Hellsinki;FI;22
03;Lisa;Hellsinki;;
04;Gian;Roma;IT;33
05;Loris;Sicilia;;
05;Vera;Sicilia;;31
In this data, rows with id = 02, 03, 05 have this very same field duplicated twice, so no matter what the rest of the data says, I need to be able to select only those rows that have the field id duplicated twice, so the expected result would be:
02;Albert;Edinburgh;;19
02;Mark;;En;19
03;Raisa;Hellsinki;FI;22
03;Lisa|Hellsinki;;
05;Loris;Sicilia;;
05;Vera;Sicilia;;31
So far I have only found the way to select rows duplicated any amount of times, which code is:
awk -F';' -v OFS=';' 'a[$1]++{print $0}' data.file
But I haven't been able to figure out the way to obtain only those lines with the id duplicated twice...
Update: like U2, I still haven't found what I'm looking for, but I have a new awk command that I think is closer:
awk -F';' -v OFS=';' '{a[$1]++; if (a[$1] == 2) {print $0}}' data.file
It correctly counts out the row with id 04, but includes rows with id 01 which is not exactly two times repeated but three...
In 2 passes:
$ awk -F';' 'NR==FNR{cnt[$1]++; next} cnt[$1]==2' file file
02;Albert;Edinburgh;;19
02;Mark;;En;19
03;Raisa;Hellsinki;FI;22
03;Lisa;Hellsinki;;
05;Loris;Sicilia;;
05;Vera;Sicilia;;31
or in 1 pass if your input is grouped by the first field as shown in your example (you can always sort
it if not):
$ awk -F';' '
$1 != prev { if (cnt == 2) print buf; prev=$1; buf=$0; cnt=1; next }
{ buf=buf ORS $0; cnt++ }
END { if (cnt == 2) print buf }
' file
02;Albert;Edinburgh;;19
02;Mark;;En;19
03;Raisa;Hellsinki;FI;22
03;Lisa;Hellsinki;;
05;Loris;Sicilia;;
05;Vera;Sicilia;;31