I am using Derby In-Memory DB. I need to perform some data loading from csv files in the beginning. For now, it takes about 25 seconds to load all the csv files into their tables. I hope the time can be reduced. Due to the data files are not very large actually.
What I have done is using the built-in procedure from derby.
{CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (?,?,?,',','"','UTF-8',1 )} or
{CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (?,?,?,',','"','UTF-8',0 )}
The only special thing is sometimes the data in one tables is splitted into many small csv files. So I have to load them one by one.And I have tested if I can combine them together, it will only take 16 seconds. However I cannot remove this feature because it is needed by the user.
Is there anything I can do to reduce the time of loading data? Should I disable log or write some user-defined function/procedure or any other tune can be done? Any advice will be fine.
Thanks!
Use H2 instead of Derby, and use the CSVREAD feature. If that's still too slow, see the fast import optimization, or use the CSV tool directly (without using a database). Disclaimer: I wrote the CSV support for H2.