I’m trying to read data which is generated by another application and stored in a Microsoft Office Access .MDB file. The number of records in some particular tables can vary from a few thousands up to over 10 millions depending on size of the model (in the other application). Opening the whole table in one query can cause an Out Of Memory exception in large files. So I split the table on some criteria and read each part in a different query. But the problem is about middle sized files that could be read significantly faster in one single query with no exceptions.
So, am I on right way? Can I solve the OutOfMemory problem in another way? Is it OK to choose one of mentioned strategies (1 query or N query) based on the number of records?
By the way, I’m using DelphiXE5 and Delphi’s standard ADO components. And I need the whole data of the table, and no joining to other tables is needed. I’m creating ADO components by code and they are not connected to any visual controls.
Edit:
Well, it seems that my question is not clear enough. Here are some more details, which are actually answers to questions or suggestions posed in comments:
This .mdb file is not holding a real database; it’s just structured data, so no writing new data, no transactions, no user interactions, no server, nothing. A third-party application uses Access files to export its calculation results. The total size of these files is usually about a few hundred MBs, but they can grow up to 2 GBs. Now I need to load this data into a Delphi data structure before starting my own calculations since there’s no place for waiting for I/O during these calculations.
I can’t compile this project for x64, it’s extremely dependent on some old DLLs that share same memory manager with main executable and their authors will never release an x64 version. The company hasn’t yet decided to replace them, and it won’t change in near future.
And, you know, support guys just prefer to tell us “fix this” rather than asking two thousand customers to “buy more memory”. So I have to be really stingy about memory usage.
Now my question is: Does TADODataSet
provide any better memory management for fetching such amount of data? Is there any property that prevents DataSet from fetching all data at once?
When I call ADOTable1.open
it starts to allocate memory and waits to fetch the entire table, just as expected. But reading all those records in a for loop will take a while and there’s no need to have all that data, on the other hand, there’s no need to keep a record in memory after reading it since there's no seeking in rows. That’s why I split table with some queries. Now I want to know if TADODataSet
can handle this or what I'm doing is the only solution.
I did some try and errors and improved performance of reading data, in both memory usage and elapsed time. My test case is a table with more than 5,000,000 records. Each record has 3 string fields and 8 doubles. No index, no primary key. I used GetProcessMemoryInfo API to get memory usage.
Initial State
Table.Open: 33.0 s | 1,254,584 kB
Scrolling : +INF s | I don't know. But allocated memory doesn't increase in Task Manager.
Sum : - | -
DataSet.DisableControls;
Table.Open: 33.0 s | 1,254,584 kB
Scrolling : 13.7 s | 0 kB
Sum : 46.7 s | 1,254,584 kB
DataSet.CursorLocation := clUseServer;
Table.Open: 0.0 s | -136 kB
Scrolling : 19.4 s | 56 kB
Sum : 19.4 s | -80 kB
DataSet.LockType := ltReadOnly;
Table.Open: 0.0 s | -144 kB
Scrolling : 18.4 s | 0 kB
Sum : 18.5 s | -144 kB
DataSet.CacheSize := 100;
Table.Open: 0.0 s | 432 kB
Scrolling : 11.4 s | 0 kB
Sum : 11.5 s | 432 kB
I also checked Connection.CursorLocarion, Connection.IsolationLevel, Connection.Mode, DataSet.CursorType and DataSet.BlockReadSize but they made no appreciable change.
I also tried to use TADOTable
, TADOQuery
and TADODataSet
and unlike what Jerry said here in comments, both ADOTable and ADOQuery performed better than ADODataSet.
The value assigned to CacheSize
should be decided for each case, not any grater values lead to better results.