BASH - How to check for duplicate email addresses across multiple files?

I'm currently working on a project where I need to send an email to a large number of email addresses. As such I am attempting to avoid any "temporary" glitches with respect to service providers throttling emails etc.

My plan is to take the initial list of email addresses and chop it up into smaller (chopped) lists, so that they can be scheduled in a staggered manner. Due to the sensitive nature of sending emails, I want to ensure that no duplicate email addresses exist across any of the chopped lists. Is there a way to do this via bash?

Side note, I am 100% certain that all email addresses in the master list are unique, due to the nature of the query used to comprise the list, I would just like to ensure, my script which chopped the master list, does not have a defect creating duplicate email addresses across the chopped lists.

Solution

You can put the chopped files together (temporarily) via cat and use sort --unique to remove duplicates - then check if the result has as many lines as the original file:

cat original_list | wc -l

and

cat list_part* | sort --unique | wc -l

if the results are same there are no duplicates.