Search code examples
databasewinapiextensible-storage-engine

how to quickly add many records with some duplicates to Extensible Storage Engine


I need to add a few million data records to an ESE database. Among other values, each record has a unique string value. This value can be thought of as a key.

Interesting to the records is that there may be multiple identical instances of the same record within the input set. Once entered I only want one record with each of the unique strings.

My question is how to do this - how can I quickly filter out duplicates?

Right now I'm adding each record only after doing a search for the key, if the entry already exists I skip it. If it's not in the database I add the record and progress. The big cost here is doing the search on each entry.

any ideas on making this very fast? is there anyway to key the value such that adding a duplicate would fail?


Solution

  • Your can just create a unique index on the string column by passing JET_bitIndexUnique into JetCreateIndex:

    JetCreateIndex(sesid, tableid, "myindex", JET_bitIndexUnique, "+string_col\0", 13, 100));
    

    An insertion of a duplicate value with fail with JET_errKeyDuplicate.

    This approach is best if your strings are short. If your strings are long you should use a hash of the string to test for uniqueness.