Search code examples
sqloptimizationsubqueryorientdbnexus

Optimize SQL query in OrientDB with two sub-queries


Assume having the following SQL query:

select count(name)
from asset
where bucket in (
  select @RID
  from bucket
  where repository_name = 'some-release'
) and blob_updated < sysdate() - 17280000000


+----+-----+
|#   |count|
+----+-----+
|0   |90717|
+----+-----+

I find out how many files in repository some-release are older than 200 days. It takes me 17.588 seconds. However the following query, which I want to utilize to convert 200 days to milliseconds, gives me the same output but takes 83.93 seconds:

select count(name)
from asset
let $days = (
  select eval ( "200 * 24 * 60 * 60 * 1000" )
)
where bucket in (
  select @RID
  from bucket
  where repository_name = 'some-release'
) and blob_updated < sysdate() - first($days.eval)

Why does it take so long and how to optimize it?

Repository some-release contains 255196 files.


Solution

  • Have you tried running your query through the EXPLAIN? More on that in OrientDB documentation.

    The LET block is evaluated per each record, so if you've got plenty of assets in your some-release repositories (or have had in the past) this will substantially fine your query evaluation time. To avoid this you can evaluate it directly in your WHERE clause, i.e.:

    select count(name)
    from asset
    where bucket in (
      select @RID
      from bucket
      where repository_name = 'some-release'
    ) and blob_updated < sysdate() - eval('200 * 24 * 60 * 60 * 1000')
    

    May I ask what are you trying to achieve? By any chance, are you trying to get rid of some old assets? you can set up a Cleanup Policy for this purpose.

    You may want to compact your blob store, to reduce the number of assets that you might not need anymore. Please make sure you understand what will happen before you run the Admin - Compact blob store task.