I have nearly 200 files and I want to find lines that are common to all 200 files,the lines are like this:
HISEQ1:105:C0A57ACXX:2:1101:10000:105587/1
HISEQ1:105:C0A57ACXX:2:1101:10000:105587/2
HISEQ1:105:C0A57ACXX:2:1101:10000:121322/1
HISEQ1:105:C0A57ACXX:2:1101:10000:121322/2
HISEQ1:105:C0A57ACXX:2:1101:10000:12798/1
HISEQ1:105:C0A57ACXX:2:1101:10000:12798/2
is there a way to do it in a batch way?
awk '(NR==FNR){a[$0]=1;next}
(FNR==1){ for(i in a) if(a[i]) {a[i]=0} else {delete a[i]} }
($0 in a) { a[$0]=1 }
END{for (i in a) if (a[i]) print i}' file1 file2 file3 ... file200
This method processes each file line-by-line. The idea is to keep track which lines have been seen in the current file by using an associative array a[line]
. 1 means that the line is seen in the current file, 0 indicates that the line is not seen.
(NR==FNR){a[$0]=1;next}
store the first file into an array indexed by the line, and mark it as seen. (NR==FNR)
is a condition used to check for the first line. (FNR==1){for(i in a) if(a[i]) {a[i]=0} else {delete a[i]} }
: if we read the first line of a file, check which lines have been seen in the previous file. If the line in the array is not seen, delete it, if it is seen, reset it to not-seen (0
). This way, we clean up the memory and handle duplicate lines in a single file.($0 in a) { a[$0]=1 }
: per line, check if the line is a member of the array, if it is, mark it as seen (1
)END{for (i in a) if(a[i]) print i}
: when all lines are processed, check which lines to print.