Search code examples
csortingbinaryfilesbinary-search

Store strings in binary file while keeping them alphabetically sorted


So, supposing I have a C program that stores strings into a binary file. And in another binary file, the position of the first byte of every string is saved.

eg. words.bin contains: alphacardelta

and pos.bin contains: 0 5 8

I want to insert "beta", and for it to go between alpha and car. While I have figured out how to insert it after alpha, I cannot figure out an automated way that will insert it in the correct position and will not result in "disappearance" of any other words in a possibly long list. Any help?

Another way that I guess could work (possibly better) is inserting beta after delta and then sorting all over again, but my attempts at this have failed too.

Any assistance appreciated.


Solution

  • I want to insert "beta", and for it to go between alpha and car.

    To do this in a file you need to write the data into the file again.

    There are two methods here.

    1. Read all the data from the file into memory (RAM). You can use arrays or structures.
    2. Add the additional data in corrrect location into memory.
    3. Write the entire file again. As an optimization you can write from the changed location to the end.

    Alternatively you can

    1. Read all the data from the file into memory (RAM). Here you can read the entire data or in suitable chunks to reduce memory usage.
    2. create a new file with the additional data
    3. write to this new file.
    4. At the end you can delete the old file and rename the new file.

    This is a very inefficient operation and the it would be better if you try to find a way to only add new data to the end of the file (append). You can keep some sort of index of the data in the beginning to know the location of the string and if it is deleleted or not.