Search code examples
chroniclechronicle-map

Chronicle Map slows down significantly with more than 200m entries


I am using Chronicle Map to temporarily store / lookup a very large number of KV pairs (several billion in fact). I don't need durability or replication, and I'm using memory mapped files, rather than pure off-heap memory. Average key length is 8 bytes.

For smallish data sets - up to 200 million entries - I get throughput of around 1m entries per second i.e it takes approx 200 seconds to create the entries, which is stunning, but by 400 million entries, the map has slowed down significantly and it takes 1500 seconds to create them.

I have run tests on both Mac OSX/16GB Quad Core/500GB SSD and Proliant G6 server running Linux with 8 cores/64GB ram/300GB Raid 1 (not SSD). The same behaviour exhibits on both platforms.

If it helps, here's the map setup:

    try {
        f = File.createTempFile(name, ".map");
        catalog = ChronicleMapBuilder
                .of(String.class, Long.class)
                .entries(size)
                .averageKeySize(8)
                .createPersistedTo(f);
    } catch (IOException ioe) {
        // blah
    }

And a simple writer test:

    long now = -System.currentTimeMillis();
    long count = 400_000_000L;

    for (long i = 0; i < count; i++) {
        catalog.put(Long.toString(i), i);
        if ((i % 1_000_000) == 0) {
            System.out.println(i + ": " + (now + System.currentTimeMillis()));
        }
    }
    System.out.println(count + ": " + (now + System.currentTimeMillis()));
    catalog.close();

So my question is - is there some sort of tuning I can do to improve this, e.g. change number of segments, use a different key type (e.g. CharSequence), or is this simply an artefact of the OS paging such large files?


Solution

  • Several things might help:

    • Ensure you use the latest available Chronicle Map version (currently this is 3.3.0-beta, the next 3.4.0-beta comes in days)

    • Indeed use garbage-free techniques, even for such test this could matter, because garbage collection may kick in:

      • Use CharSequence as the key type and LongValue as the value type.
      • Simple test code could look like

        public class VinceTest {
            public static void main(String[] args) throws IOException {
                long count = 400_000_000L;
                File f = File.createTempFile("vince", ".map");
                f.deleteOnExit();
                try (ChronicleMap<CharSequence, LongValue> catalog = ChronicleMap
                        .of(CharSequence.class, LongValue.class)
                        .entries(count)
                        .averageKeySize(8.72)
                        .putReturnsNull(true)
                        .createPersistedTo(f)) {
        
                    long prev = System.currentTimeMillis();
        
                    StringBuilder key = new StringBuilder();
                    LongValue value = Values.newHeapInstance(LongValue.class);
        
                    for (long i = 1; i <= count; i++) {
                        key.setLength(0);
                        key.append(i);
                        value.setValue(i);
                        catalog.put(key, value);
                        if ((i % 1_000_000) == 0) {
                            long now = System.currentTimeMillis();
                            System.out.printf("Average ns to insert per mi #%d: %d\n",
                                    (i / 1_000_000), now - prev);
                            prev = now;
                        }
                    }
                    System.out.println("file size " + MEGABYTES.convert(f.length(), BYTES) + " MB");
                }
            }
        }
        
      • From the above source, note the usage of putReturnsNull(true) to avoid accidental garbage creation as the value returned (though not the case for this test, because all keys are unique and put() always returns null, but may be the case for your production)

    • Ensure you specified the right averageKeySize(). From this test, the average key size is actually closer to 9 bytes (because most keys are bigger than 100 000 000). But better to be as precise as possible, this is 8.72 for this particular test with the count of 400 000 000.