Search code examples
javamemorycollectionscrudflat-file

Record manipulation in flat files in Java


This is my first post here in StackOverflow. I am not a newbie to Java but I am not an expert or a professional programmer either. It has been quite some time I have had some ideas in my head and I do not know how to implement them in a proper way.

In general I am writing a software (actually many individual applications) to manipulate a list of things (say a list of phone numbers). My application has to be contained within a single file(jar) or folder/directory. It has to be a "Remove out of box and Click and Run" application. Creating GUI in Java is not an issue.

My problem is the storage of data. Do not suggest me to use any third party database server or application; since I am going to store the data in either a flat file in a normal format (or in an XML file).

Currently 2 typical ideas come to my head for CRUDing (CRUD = Create Read Update Delete) the data in the file. They are as follows:-

1 Do not use a collection.

  • create - append record to file
  • read - read complete file
  • update - copy all records to new file except inplace of the one that needs to be replaced copy the new data; delete old file; rename new file to old file
  • delete - copy all records to new file except the one that needs to be deleted; delete old file; rename new file to old file.
  • Advantage : Less memory requirement.
  • Disadvantage : Lots of file IO.

2 Use a collection

  • Application Start - load all records into a collection from a file
  • Application Stop - save all records from collection into a file
  • create - add record to collection
  • read - read all elements of the collection
  • update - update record directly in the collection
  • delete - delete record from collection
  • Advantage : Very less file IO.
  • Disadvantage : Heavy memory requirements. Application DOES crash if there is no memory left while loading all records.

Both methods have their pros and cons. Is there some other way? Or is there a way in between these 2 ways? I desperately need some guidance here. Been on this problem for a very long time. Any theories, suggestions or pointers are welcome!


Would the following approach be fine? Would it be harmful or bad in any way? I mean what will be its disadvantage?

Note: r --> Record. Each record is on a new line. The fields within each record are seperated by some delimiter say '::'. So I would use BufferedReader to get each line easily. The sizes are just hypothetical or just to give you a picture.

File ={r1 r2 r3 r4 r5 ... r500} //the file has 500 records
Collection cPrev,cCurrent,cNext //3 collection objects holding consecutive records; each holding (say) 30 records

So in the beginning
cPrev = {}
cCurrent = {r1 r2 r3 ... r30} //filled by main thread
cNext = {r31 r32 r33 ... r60} //filled by child thread while user viewing cCurrent

cCurrent is viewable by the user. The user can scroll up and down (or whatever direction) and view all 30 records. Now user wants to see the next set of records. So
cPrev = cCurrent //main thread
cCurrent = cNext //main thread
Therefore
cPrev = {r1 r2 r3 ... r30}
cCurrent = {r31 r32 r33 ... r60}
cNext = {r61 r62 r63 ... r90} //filled by child thread while user viewing cCurrent

Consider the follwing state
cPrev = {r121 r121 r123 ... r150}
cCurrent = {r151 r152 r153 ... r180}
cNext = {r181 r182 r183 ... r210}

If user wants to see records before r151 then
cNext = cCurrent //main thread
cCurrent = cPrev //main thread

So cPrev = {r90 r91 r92 ... r120} //filled by child thread while user viewing cCurrent
cCurrent = {r121 r121 r123 ... r150}
cNext = {r151 r152 r153 ... r180}

Obviously, the next and previous can be performed as long as there are records after and before in the file. Performing the "next" operation is easy and simple. I would not need to close the file connection, and just start reading from where I left off.

But what about the "previous" operation? the only solution that comes to my mind is [1] close current file connection [2] open new file connection [3] start reading from start of file till concerned record is reached and [4] then assign the set of records to the collection. (I don't know how to ask this) What is wrong with this approach? Plus, is there a better way or algorithm here? Guys, keep it simple but not compressed. I am no Java guru.


Solution

  • You are managing a list of telephone numbers. I suppose you are not really concerned with application performance at this point; you will not be making difficult queries through tons of data.

    Then why not use Hibernate/JPA together with an embedded database? This way, you can do CRUD on simple data but easily scale out to a relational model if needed. The embedded database manages caching, transactions, locking... The disadvantage is a steep learning curve.

    So if you want to avoid the steep learning curve, I suggest you use the Collections method. You are concerned with your application crashing if it runs out of memory. Is this a real problem, or just a theoretical one? Can't you slice up your data into parts, only loading a single part into memory at any given time and serialising the rest to disk. Something like:

    private List<DataSlice> slices;
    public class DataSlice {
      private ArrayList<Object> data;
      private File backingFile;
    
      private void load() {
        data = deserialize(backingFile);
      }
    
      private void release() {
        if(dirty) save(data, backingFile);
        data = null; // data is garbage collected, but there is a risk the objects are still referenced in memory
      }
    
      private void doCrudOperation() {
        dirty = true;
        doSomething();
      }
    }
    

    This is apparantly already implemented in "vanilla-java" (the HugeCollections package): http://code.google.com/p/vanilla-java/

    Be warned though, it is probably better to go with an embedded DB in the long run. You need to learn about Java Entity Beans and java persistence, but you will be able to use this for years to come.