Search code examples
pythonfilesearchdictionarystring-search

Searching a string from a file - python


I have a human dictionary file that looks like this in eng.dic (image that there is close to a billion words in that list). And I have to run different word queries quite often.

apple
pear
foo
bar
foo bar
dictionary
sentence

I have a string let's say "foo-bar", is there a better (more efficient way) of searching through that file to see whether it exist, if it return exist, if it doesnt exist, append the dictionary file

dic_file = open('en_dic', 'ra', 'utf8')
query = "foo-bar"
wordlist = list(dic_file.readlines().replace(" ","-"))
en_dic = map(str.strip, wordlist)

if query in en_dic:
    return 1
else:
    print>>dic_file, query

Is there any in-built search functions in python? or any libraries that i can import to run such searches without much overheads?


Solution

  • As I already mentioned, going through the whole file when its size is significant, is not a good idea. Instead you should use established solutions and:

    1. index the words in the document,
    2. store the results of indexing in appropriate form (I suggest database),
    3. check if the word exists in the file (by checking the database),
    4. if it does not exist, add it to file and database,

    Storing data in database is really a lot more efficient than trying to reinvent the wheel. If you will use SQLite, the database will be also a file, so the setup procedure is minimal.

    So again, I am proposing storing words in SQLite database and querying when you want to check if the word exists in the file, then updating it if you are adding it.

    To read more on the solution see answers to this question:

    The most efficient way to index words in a document