I have an application that receives files with a flat table in DBF, which normalize and convert and insert into a MySQL database, files that grow to between 250,000 to 270,000 records and now consumes about 1.5 - 2 Gb at all I must repeat the process and by parts.
I check in the process if there are no prior records, help in HashMaps me the keys that are unique records to not save all the information in them, the problem here is that as to be uploading files to the database DBF total records data grows and every time you import something becomes slower and consume much more memory, easily get to 770,000 records in one table and a call to spend more of the 2,000,000
example:
DBF file has 91 fields and I have normalized data base gives 5 main tables and I must repeat the process of remembering the DBF 5 times.
nothing can take me about 1 hour migrate all DBF file of 250,000 records.
How can you minimize the time and RAM consumption, unless the RAM consumption is lower and I get several Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
DBF file structure has many redundant data is like this (columns):
example:
| survey_code | survey_columns | group_survey | more_data | person_details |
|:------------|:---------------|:----------------|:----------|:---------------|
| 1 | ... | 1 | ... |1-oscar |
| 1 | ... | 1 | ... |2-juan |
| 1 | ... | 2 | ... |1-catalina |
| 2 | ... | 1 | ... |7-john |
note: As you can see there is enough data redundancy
the structure of the MySQL database is like this:
Survery (table)
id
survey_data
Group_in_survey (table)
survey_id
group_data
Person (table)
group_id
person_details
the import process as divided according to the tables and the order defined by their dependence, for each table, runs through all the records in the DBF file.
Before I make a request to start the database and then I load into a HashMap information necessary to perform the verification if records exist before being added to the database in MySQL.
Also every time you add a record to the MySQL database that record tamibén add to HashMap, since one of the debugging process is duplicated. This latter is done in each of the processes
From what I understand you are inserting your records first into the HashMap to get rid of duplicates and they you copy it to the MySQL database. If that is the case you can try to use HugeCollections library and their HugeMap instead of simple HashMap