I am creating dictionary out of a large file.
def make_dic():
big_dic={}
for foo in open(bar):
key,value=do_something(foo)
big_dic[key]=value
def main():
make_dic() #this takes time
I have to access this dictionary many times but from completely different programs. It takes lot of time to read this file and make dictionary. Is it possible to make a dictionary which remains in memory even if one program exits???? So that I create it once but can use it again and again from different programs....
This won't work for all situations that fit your description, but cPickle
should help with speed.
The only problem I can think of is that combining data persistence with IPC is tough. So if these different programs are modifying the dictionary at the same time, pickle
won't help. Another approach might be to use a database...
I like Sven Marnach's suggestion, but there are some tradeoffs worth considering. Some setup...
>>> pickle_file = open('pickle_foo', 'w')
>>> anydbm_file = anydbm.open('anydbm_foo', 'c')
>>> d = dict((str(i), str(j)) for i, j in zip(range(999999, -1, -1), range(0, 1000000)))
Obviously populating the anydbm_file
will be pretty slow:
>>> %timeit for k, v in d.iteritems(): anydbm_file[k] = v
1 loops, best of 3: 5.14 s per loop
The time is comparable to the time it takes to dump and load a pickle file:
>>> %timeit cPickle.dump(d, pickle_file)
1 loops, best of 3: 3.79 s per loop
>>> pickle_file.close()
>>> pickle_file = open('pickle_foo', 'r')
>>> %timeit d = cPickle.load(pickle_file)
1 loops, best of 3: 2.03 s per loop
But the anydbm_file
you only have to create once; then, opening it again is nigh-instantaneous.
>>> %timeit anydbm_file = anydbm.open('anydbm_foo', 'r')
10000 loops, best of 3: 74.3 us per loop
So anydbm
has the advantage there. On the other hand,
>>> %timeit for i in range(1, 1000): x = anydbm_file[str(i)]
100 loops, best of 3: 3.15 ms per loop
>>> %timeit for i in range(1, 1000): x = d[str(i)]
1000 loops, best of 3: 374 us per loop
Reading a key from anydbm_file
takes ten times longer than reading a key from a dictionary in memory. You'd have to do a lot of lookups for this difference to outweigh the 5 seconds necessary for a pickle dump/load cycle; but even if you don't, the difference in read times here could lead to sluggish performance, depending on what you're doing.
Other options are SQLite3
or (for a separate database server process that allows connections from multiple processes running concurrently), psycopg2
+ PostgreSQL.