Search code examples
javamariadbbulkinsertcolumnstore

MariaDb ColumnStore BulkInsert


I have some code that I'm writing which uses a bulk insert and looks something like this:

ColumnStoreBulkInsert b = d.createBulkInsert("pst", "events", (short) 0, 0); 
try {
  for (Map<String, Object> record : records) {
    try {
      for (int i = 0; i < schema.length; i++) {
          Object value = record.get(schema[i].toLowerCase());
          String val = value.toString();
          b.setColumn(i, val);
      }
      b.writeRow();
      if (currentBatchSize >= batchSize) {
        b.commit();
        currentBatchSize = 0;
      }
    }
    catch (ColumnStoreException e) {
      b.rollback();
    }
  }
}
catch(Exception e) {
  throw new RuntimeException(e);
}

And the issue I'm having is that when I run this, I run out of memory (seemingly) because I have to create a new ColumnStoreBulkInsert every time. My question is have other people run into this, and if so, how is this avoidable. Thanks!


Solution

  • thanks for your post. You hit two bugs in javamcsapi's memory management that aren't fixed yet. The bugs result from using Swig to generate the binding between Java and our base C++ API mcsapi.

    The first bug is that the automatic garbage collection in Java doesn't collect the C++ ColumnStoreBulkInsert object once the Java wrapper ColumnStoreBulkInsert object is not needed any more. It is documented as MCOL-1407 [1] in our bug tracker Jira.

    Usually you can invoke a manual garbage collection of the regarding C++ object through its wrapper object's delete() method. Unfortunately this is also broken for ColumnStoreBulkInsert in version 1.1.5 of our javamcsapi. I documented it as MCOL-1588 in our bug tracker Jira [2] and just committed a patch that will be part of our 1.1.6 release.

    Once that patch passes our internal quality assurance mechanisms you could compile a develop build of javamcsapi from our Github repository [3], wait for the 1.1.6 release, or download a compiled version of mcsapi from our nightly repository servers [4].

    Here is an example how the manual garbage collection through ColumnStorBulkInsert's delete() method would work.

    import com.mariadb.columnstore.api.*;
    
    public class MCSAPITest {
    
            public static void main(String[] args) {
            ColumnStoreDriver d = new ColumnStoreDriver();
            for(int i=0; i<Integer.MAX_VALUE; i++){
                ColumnStoreBulkInsert b = d.createBulkInsert("test", "garbage_test", (short)0, 0);
                try{
                    b.setColumn(0, i);
                    b.setColumn(1, Integer.MAX_VALUE-i);
                    b.writeRow();
                    b.commit();
                } catch(ColumnStoreException e){
                    b.rollback();
                    e.printStackTrace();
                } finally{
                  b.delete(); //<--This is the important part
                }
            }
        }
    }
    

    Please do not hesitate to reply if you have any further questions.

    [1] https://jira.mariadb.org/browse/MCOL-1407

    [2] https://jira.mariadb.org/browse/MCOL-1588

    [3] https://github.com/mariadb-corporation/mariadb-columnstore-api/tree/develop-1.1

    [4] http://34.238.186.75/repos/1.1.6-1/nightly/mariadb-columnstore-api/yum/centos/7/x86_64/