Search code examples
ccachingfile-iofseek

Efficient random access within a file? [C]


I have a text file I use to hold an index of files and words (with their frequencies) that appear in them. I need to read the file into memory and store the words so they can be searched. The file is formatted as follows:

<files> 169
    0:file0.txt
    1:file1.txt
    2:file2.txt
    3:file3.txt
    ... etc ...
</files>
<list> word 2
    9: 10
    1: 2
</list>
<list> word2 4
    3: 19
    5: 12
    0: 2
    8: 2
</list>
... etc ...

The problem is that this index file can become extremely large and won't all fit into memory at once. My solution is to only store a handful of them in a HashTable at once and then when I need to get the data for another word, I would kick an old word out and then parse the data for the new word from a file.

How can I efficiently accomplish this in C? I was thinking I would have to do something with fseek and rewinding once I got to certain points.

Thanks,
Mike


Solution

  • It ended up that the best way to do this (for my needs) was to keep a pointer to current location in the file and the use rewind( FILE *f ); when I reached the end.