Search code examples
javaamazon-s3microstream

MicroStream + AWS S3 Blob example


I'm trying to develop a simple application using MicroStream and AWS S3 as a Blob storage Using the examples in the official page and others, I can't store and query some elements

S3 Connection is working.

My Code

S3Client cli = S3Client.builder()
      .region(Region.US_EAST_1)
      .credentialsProvider(StaticCredentialsProvider.create(

            AwsSessionCredentials.create(accessKey, secret, token)))
      .build()
      ;

BlobStoreFileSystem fileSystem = BlobStoreFileSystem.New(
      S3Connector.Caching(cli)
);

final EmbeddedStorageManager storageManager =EmbeddedStorage.start(fileSystem.ensureDirectoryPath("s3-folder"));

HashMap<Integer, Object> database = new HashMap<>();

if (storageManager.root() == null) {
   storageManager.setRoot(database);
   storageManager.storeRoot();
} else {
   database = (HashMap<Integer, Object>) storageManager.root();
}

Storer storage = storageManager.createLazyStorer();

for(int i=0; i < 1_000_000; i++) {
   database.put(i, UUID.randomUUID().toString());

   if (i == 500_000) {
      System.out.println("Value: " + database.get(500_000));
   }
}

storage.storeAll(database);
storage.commit();

System.out.println("*************************");
System.out.println(((Map<String, Object>)storageManager.root()).get(500_000));

storageManager.shutdown();

Output java.lang.OutOfMemoryError: Java heap space Exception in thread "Daemon Thread 6" java.lang.OutOfMemoryError: Java heap space

I'm trying to persist and read objects from my collection in S3


Solution

  • The simplest solution would be to increase the usable java heap space, please see here for details.

    Here is a short explanation why the memory requirement of your example is unexpected high:

    • Each persisted java object also requires some additional management data in memory. A HashMap with 1 million entries results in 2M (key and value) + 1 (the map) objects to be persisted.
    • When storing data Microstream internally collects all necessary data before it is written to a storage target, this may require a lot of memory during the store operation if many objects are stored at once. In your example you are always storing the whole hash map and its content.
    • Microstream also caches data.

    If increasing the java heap size is no option you may also have look at the Lazy-Loading feature of Microstream.