I have one desktop application receiving data from a webservice and storing it inside a local postgresql database (while the webservice retrieves data from a SQL Server database). At the end of the process there will be a minimum of 2.5 million entries inside a table in my local database but this will be received from de webservice in batches of about 300 rows at time and within a time frame of about 15 days.
What I need is a way to make sure that my local database has the exact same information the server's database has.
I'm thinking of creating some sort of checksum for each batch received and then, after all batches were received, another checksum of the entire table but I don't know if this is the best solution and, if is, I don't know where to start to create it.
PS: TCP already handles integrity check so I don't even know if this is needed, but it is critical that the data are the same.
I can see how a checksum could possibly be useful, but the amount of transformation you're doing would probably make it impractical. You'd have to derive the checksum on either the original form of the data or on the transformed form; it wouldn't be valid on both.
You have some strange constraints (been there myself), so it's kind of hard to come up with a clear strategy without knowing all the details. Maybe one of the following suggestions would work.
A simple count(*) on the SQL Server side and on the PostgreSQL side after the migration is complete.
Dump out a list of keys from the SQL Server side and from the PostgreSQL side after the migration is complete, and then sort and compare those files.
If 1 and 2 aren't possible because of limited access to SQL Server, maybe dump out the results of the web service calls to a single file location as you go along, and then extract the same data from PostgreSQL at the end, and compare those files.
There are numerous tools available for comparing files if you choose options 2 or 3.