Search code examples
javamemoryspring-integrationspring-batchfile-comparison

How to compare two big unsorted CSV files using Spring Batch?


I have a task of comparing two big csv files and write out the comparison result to a new file. File 1 has 200k rows and file 2 could also have 200K or less than that. Both have 200 columns. The files are not sorted and can be in any order. I am using Java 8 and Spring Version 4.

Question

I am using Spring Batch in my project, is there any way I can achieve this using Spring Batch customized ItemReader and ItemWriter or should I use a tasklet and then plain Java code to compare the files? I also wanted to do it in the fastest way. The volume of the data will be really huge may be 2-4 Gigs so I don't want to load it in the memory. The file structures are something like the below.

File1:
regn_nbr,name,address1,countrycode,regn_date
2345,John,4332 JFK Boulevard,US,02-12-2011
2347,mark,4332 Maryland Avenue,US,04-27-2015
2348,Smith,4332 JFK road,US,07-30-2011
2302,Andy,4332 JFK lane,US,06-01-2010

File2:
regn_nbr,name,address1,countrycode,regn_date
2345,John,4332 JFK Boulevard,US,02-12-2011
2302,Andy,4332 JFK lane,US,06-01-2010
2911,Peter,12 candle drive,MX,01-01-2010
2348,Smith,4332 JFK road,US,07-30-2011
2347,mark,4332 Maryland Avenue,US,04-27-2015

Your suggestions, different approaches, strategies and expertise are most welcome.


Solution

  • are you sure you need a special program for that?

    i would try it with

    if memory really is your primary concern, well all it needs is a some java main class, some java nio and simple java sql