I need to implement a search engine. So I have a dictionary which is a hash table and it consists words. Also I have some texts, I need to go over all the texts and put into the posting file the text number and the place of each word in the texts.
So each time I have an occurrence of some word and that word already exists in the posting file I need to add another occurrence of that word, meaning to update that line where the word is in the posting file. But because the posting file looks something like that:
word1: 1(2,4,5) 4(66,42,21)
word2: 1(3,66) 6(12,19)
I cant write something new in line 1 because that will affect line 2 as I understand.
So the question is how can I do it? Can I maybe somehow instead of just writing strings into the file, write some data structure? like a hash table? so for each word there will be a hash table in the posting file and if I will see that the word already exists in the posting file I will read its hashtable, update it and rewrite it into the file.
Or is there something better?
Thanks in advance,
Greg
Have you thought about using XML to do this? A simple structure like:
<searchkeys>
<key name="word1">
<text id="1">2,4,5</text>
<text id="4">66,42,21</text>
</key>
<key name="word2">
<text id="1">3,66</text>
<text id="6">12,19</text>
</key>
</searchkeys>
You can use the XmlDocument, XmlReader, XmlWriter, etc classes to manipulate the files and get fancier from there.
If this is going to contain a lot of data you might consider using a DB for doing this (Access, MS SQL (Express, or Standard), SqlLite, MySql etc).