Search code examples
pythonshelve

Is shelve really slow and taking a lot of memory or am I doing something wrong?


I'm trying to write a program that uses a shelve database of sorted letters as key, and a list of words that can be created from them as values. e.g:

db['mnoo'] = ['moon', 'mono']

So I wrote a function that takes a filename and loads it into a shelve. The first part, that turns the file into a dictionary with the same layout as the shelve works fine, but the shelve part takes really long.

I'm trying it with a dictionary of ~100k entries, each value being a list. It seems to take 15-20 seconds for each 1000 enteries, and each entry seems to take ~1kb of space. Is this nromal?
the code:

def save_to_db(filename, shelve_in='anagram_db'):
    dct = anagrams_from_list(process_file(filename))

    with shelve.open(shelve_in, 'c') as db:
        for key, wordlist in dct.items():
            if not key in db:
                db[key] = wordlist
            else:
                db[key].extend(wordlist)

edit: just a quick clarification: each list in dict is about 1-3 words long, shouldn't be too large


Solution

  • First -- yes, shelve's default pickle backend is slow and inefficient, and your best choice is to use something different.

    Second -- you're making it worse by editing entries once they're there, rather than getting them into their final state in-memory before serializing them only once.

    dct = anagrams_from_list(process_file(filename))
    for key, wordlist in dct.items():
      content = {}
      for key, wordlist in dct.iteritems():
        if not key in content:
          content[key] = wordlist
        else:
          content[key].extend(wordlist)
    
    for k, v in content.iteritems():
      db[k] = v
    

    If you want an efficient database, I'd look elsewhere. tokyocabinet, kyotocabinet, SQLite, BDB; the options are numerous.