Search code examples
sqlindexingorientdb

OrientDB: slow query, need help creating index to speed it up


I'm using an SQL query to retrieve money transactions from my OrientDB database (v2.1.16)

The query is running slowly and I'd like to know how to create the index that will speed it up.

The query is:

SELECT timestamp, txId 
FROM MoneyTransaction
WHERE (
    out("MoneyTransactionAccount").in("AccountMoneyProfile")[accountId] = :accountId
    AND moneyType = :moneyType
    AND :registerType IN registerQuantities.keys()    
)    
ORDER BY timestamp DESC, @rid DESC

I also have another variant that resumes the list from a specific point in time:

SELECT timestamp, txId 
FROM MoneyTransaction
WHERE (
    out("MoneyTransactionAccount").in("AccountMoneyProfile")[accountId] = :accountId
    AND moneyType = :moneyType
    AND :registerType IN registerQuantities.keys()    
)
AND timestamp <= :cutoffTimestamp
AND txId NOT IN :cutoffTxIds

ORDER BY timestamp DESC, @rid DESC

The difficulty I have is trying to figure out how to create an index with the more complex fields, namely the accountId field which doesn't reside within the same vertex, and the registerType field which is to be found within an EMBEDDEDMAP field.

Which index would you create to speed up this query? Or how would you rewrite this query?

My structure is as follows:

[Account] --> (1 to 1) AccountMoneyProfile --> [MoneyProfile]
[MoneyTransaction] --> (n to 1) MoneyTransactionAccount --> [MoneyProfile]

Important fields:

Account.accountId STRING
MoneyTransaction.registerQuantities EMBEDDEDMAP
MoneyTransaction.timestamp DATETIME

The account I'm fetching right now has about 500 MoneyTransaction vertices attached to it.


Solution

  • about the index choice, it depends by the amounts of your dataset:

    • If the dataset isn't very large, you could use an SB-TREE index because they maintain sorting and allow range operations;
    • If the dataset instead is very large, you could use an HASH INDEX which is more functional on large numbers and consumes less resources than other indexes, but it doesn't support range operations.

    In your case you could create, for example, an SB-TREE UNIQUE INDEX on the accountId (e.g. Account.accountId) and rewrite your query in a way that the target query directly matches the index and so that it reads fewer records as possible. Example:

    SELECT timestamp, txId
    FROM (
         SELECT expand(out("AccountMoneyProfile").in("MoneyTransactionAccount"))
         FROM Account
         WHERE accountId = :accountId
         )
    WHERE moneyType = :moneyType AND :registerType IN registerQuantities.keys()
    ORDER BY timestamp DESC, @rid DESC
    

    In this way you directly select the Account records you're looking for (by using the index previously created) and then you can retrieve only the connected MoneyTransaction records.

    You can find more detailed information about indexes in the OrientDB official documentation.

    Another way, based on the fact that you specified that MoneyProfile class doesn't contains important data (if I've understood well), could be to change the structure to make the search more direct. E.g.:

    Before:

    enter image description here

    After (I've previously created a new AccountMoneyTransaction edge class):

    enter image description here

    Hope to have been helpful