Search code examples
orientdb

OrientDB keeps locking records indefinitely


I'm running OrientDB (community edition 2.2.9) in distributed mode with multiple nodes.

After a couple of minutes, I start getting the following error on my queries:

com.orientechnologies.orient.server.distributed.task.ODistributedRecordLockedException: Timeout (1500ms) on acquiring lock on record #1010:2651. It is locked by request 3.1000 DB name="MyDatabase"

The query in this instance looks like this:

UPDATE #1010:2651 SET name='foo';

The record remains locked and I can't run the query until I restart the database.

If I don't run the server in distributed mode, I don't get this error so it must have something to do with running it in distributed mode.

Here is my default-distributed-db-config.json

{
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": 1,
  "executionMode": "asynchronous",
  "readYourWrites": true,
  "servers": {
    "*": "master"
  },
  "clusters": {
    "internal": {
    },
    "*": {
     "servers": ["<NEW_NODE>"]
    }
  }
}

I was using the following configuration in my orientdb-server-config.xml:

    ....
    <handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
        <parameters>
            ....
            <parameter value="com.orientechnologies.orient.server.distributed.conflict.ODefaultReplicationConflictResolver" name="conflict.resolver.impl"/>
            ....
        </parameters>
    </handler>
   ...

By removing the "ODefaultReplicationConflictResolver" paramter from the config, the locking issue happens less frequently.

Why are the records locking up like this and how can I avoid it?


Solution

  • Using asynchronous execution mode may cause this problem. See: Asynchronous replication mode.

    You can try changing execution mode or try adding a retry into your query. Using Java: it is possible to catch events of command during asynchronous replication, thanks to the following method of OCommandSQL:

    • onAsyncReplicationOk(), to catch the event when the asynchronous replication succeed
    • onAsyncReplicationError(), to catch the event when the asynchronous replication returns error

    Example retrying up to 3 times in case of concurrent modification exception on creation of edges:

    g.command( new OCommandSQL("create edge Own from (select from User) to (select from Post)")
     .onAsyncReplicationError(new OAsyncReplicationError() {
      @Override
      public ACTION onAsyncReplicationError(Throwable iException, int iRetry) {
        System.err.println("Error, retrying...");
        return iException instanceof ONeedRetryException && iRetry<=3 ? ACTION.RETRY : ACTION.IGNORE;
      }
    })
     .onAsyncReplicationOk(new OAsyncReplicationOk() {
       System.out.println("OK");
     }
    ).execute();
    

    Or adding retry into a SQL Batch:

    begin
    let upd = UPDATE #1010:2651 SET name='foo'
    commit retry 100
    return $upd
    

    Hope it helps.