I am testing BerkeleyDB Java Edition to understand whether I can use it in my project.
I've created very simple program which works with object of class com.sleepycat.je.Database:
writes N records of 5-15kb each, with keys generated like Integer.toString(random.nextInt());
reads these records fetching them with method Database#get in the same order they were created;
reads the same number of records with method Database#get in random order.
And I now see the strange thing. Execution time for third test grows very non-linearly with increasing of the number of records.
(I've run tests several times, of course.)
I suppose I am doing something quite wrong. Here is the source for reference (sorry, it is bit long), methods are called in the same order:
private Environment env;
private Database db;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen() {
EnvironmentConfig ec = new EnvironmentConfig();
DatabaseConfig dc = new DatabaseConfig();
ec.setAllowCreate(true);
dc.setAllowCreate(true);
env = new Environment(new File("mydbenv"), ec);
db = env.openDatabase(null, "moe", dc);
return true;
}
public int storeRecords(int i) {
int j;
long size = 0;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
key.setData(k.getBytes());
val.setData(data);
db.put(null, key, val);
}
System.out.println("GENERATED SIZE: " + size);
return j;
}
public int fetchRecords(int i) {
int j, res;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
key.setData(k.getBytes());
db.get(null, key, val, null);
if (Arrays.equals(data, val.getData())) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.getData().length);
}
}
return res;
}
public int fetchRandom(int i) {
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
key.setData(k.getBytes());
db.get(null, key, val, null);
}
return i;
}
Performance degradation is non-linear for two reasons:
Note that you can improve write performance by giving up some durability: ec.setTxnWriteNoSync(true);
You might also want to try Tupl, an open source BerkeleyDB replacement I've been working on. It's still in the alpha stage, but you can find it on SourceForge.
For a fair comparison between BDB-JE and Tupl, I set the cache size to 500M and an explicit checkpoint is performed at the end of the store method.
With BDB-JE:
With Tupl:
BDB-JE is faster at writing entries, because of its log-based format. Tupl is faster at reading, however. Here's the source to the Tupl test:
import java.io.; import java.util.;
import org.cojen.tupl.*;
public class TuplTest { public static void main(final String[] args) throws Exception { final RandTupl rt = new RandTupl(); rt.dbOpen(args[0]);
{
long start = System.currentTimeMillis();
rt.storeRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("store duration: " + (end - start));
}
{
long start = System.currentTimeMillis();
rt.fetchRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("fetch duration: " + (end - start));
}
}
private Database db;
private Index ix;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen(String home) throws Exception {
DatabaseConfig config = new DatabaseConfig();
config.baseFile(new File(home));
config.durabilityMode(DurabilityMode.NO_FLUSH);
config.minCacheSize(500000000);
db = Database.open(config);
ix = db.openIndex("moe");
return true;
}
public int storeRecords(int i) throws Exception {
int j;
long size = 0;
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
ix.store(null, k.getBytes(), data);
}
System.out.println("GENERATED SIZE: " + size);
db.checkpoint();
return j;
}
public int fetchRecords(int i) throws Exception {
int j, res;
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
byte[] val = ix.load(null, k.getBytes());
if (Arrays.equals(data, val)) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.length);
}
}
return res;
}
public int fetchRandom(int i) throws Exception {
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
ix.load(null, k.getBytes());
}
return i;
}
}