Search code examples
pythonmachine-learningcaffepycaffelmdb

how to append data to existing LMDB?


I have around 1 million images to put in this dataset 10000 at a time appended to the set.

I"m sure the map_size is wrong with ref from this article

used this line to create the set

env = lmdb.open(Path+'mylmdb', map_size=int(1e12)

use this line every 10000 sample to write data to file where X and Y are placeholders for the data to be put in the LMDB.

env = create(env, X[:counter,:,:,:],Y,counter)


def create(env, X,Y,N):
    with env.begin(write=True) as txn:
        # txn is a Transaction object
        for i in range(N):
            datum = caffe.proto.caffe_pb2.Datum()
            datum.channels = X.shape[1]
            datum.height = X.shape[2]
            datum.width = X.shape[3]
            datum.data = X[i].tostring()  # or .tostring() if numpy < 1.9
            datum.label = int(Y[i])
            str_id = '{:08}'.format(i)

            # The encode is only essential in Python 3
            txn.put(str_id.encode('ascii'), datum.SerializeToString())
        #pdb.set_trace()
    return env

How can I edit this code such that new data is added to this LMDB and not replaced as this present method replaces it in the same position. I have check the length after generation with the env.stat().


Solution

  • Le me expand on my comment above.

    All entries in LMDB are stored according to unique keys and your database already contains keys for i = 0, 1, 2, .... You need a way to find unique keys for each i. The simplest way to do that is to find the largest key in existing DB and keep adding to it.

    Assuming that existing keys are consecutive,

    max_key = env.stat()["entries"]
    

    Otherwise, a more thorough approach is to iterate over all keys. (Check this.)

    max_key = 0
    for key, value in env.cursor():
        max_key = max(max_key, key)
    

    Finally, simply replace line 7 of your for loop,

    str_id = '{:08}'.format(i)
    

    by

    str_id = '{:08}'.format(max_key + 1 + i)
    

    to append to the existing database.