Search code examples
orientdborientdb2.2

Poor performance on high volume data sets


I am having poor performance on queries on classes that have a high record count. I am running some pretty straight forward queries but the performance is unusable. I really like orientdb and this is my 5th development project with odb as a backend. However, this particular project is forcing me to look for alternatives.

Please see below for detailed explanation.

We have a Parent class which does not grow much, the entire dataset has roughly 900 records.

However, the Child class, grows at a rate of 100k records over 15 minutes. It's generating millions of records on a daily basis.

Records are inserted to the child class as an update/add of the parent.

create class Parent extends V create class Child

Parent data model

    {
        "id": string,
        "name": string,
        "rank": integer,
        "price_usd": float,
        "price_cny": float,
        "volume_24h_usd": float
        "market_cap_usd": float,
        "available_supply": integer,
        "total_supply": integer,
        "last_updated": datetime,
        "MarketName": string,
        "Exchange": string,
        "last_refresh": datetime
    }

Child data model

{
        "MarketName": string,
        "High": float,
        "Low": float,
        "Volume": float,
        "Last": float,
        "Bid": float,
        "Ask": float,
        "BaseVolume": float,
        "TimeStamp": datetime
    }

Parent Data Sample

    {
        "@type": "d",
        "@rid": "#506:403225",
        "@version": 1,
        "@class": "Parent",
        "MarketName": "USD-CNY",
        "High": 0.00026815,
        "Low": 0.00023002,
        "Volume": 17005.93615271,
        "Last": 0.00026103,
        "Bid": 0.00026104,
        "Ask": 0.000265,
        "BaseVolume": 4.15293493,
        "TimeStamp": "2017-07-15T18:28:11.857",
        "@fieldTypes": "High=d,Low=d,Volume=d,Last=d,Bid=d,Ask=d,BaseVolume=d"
    }

Child record insert

Update Parent add Child = [
    {  
       "@type":"d",
       "@class":"Child",
       "MarketName":"USD-CNY",
       "High":0.000083,
       "Low":0.00006815,
       "Volume":18688741.88795826,
       "Last":0.00006857,
       "Bid":0.00006857,
       "Ask":0.00006889,
       "BaseVolume":1412.08213181,
       "TimeStamp":"2017-07-19T17:12:59.44"
    }
] where MarketName = "USD-CNY"

Poor performance on following query

    select 
        MarketName, 
        Last, 
        Bid, 
        Ask, 
        High, 
        Low, 
        BaseVolume, 
        TimeStamp, 
        TimeStamp.asDateTime().format('yyyy-MM-dd') as date, 
        TimeStamp.asDateTime().format('hh:mm:ss') as time, 
        if(eval('TimeStamp.asDateTime().format("hh") between 0 and 11'),'PM','AM') as hour12 
    from Child where MarketName = "USD-CNY" order by TimeStamp desc

Additional information.

In the Child class, we have roughly 9 million records WHERE MarketName = "USD-CNY"

The execution of this query, takes roughly 33 - 40 seconds.

I have not done any indexing on the parent or child class, need some guidance on proper indexing techniques for a multimodel database.

Also, I am not expecting a response of the 9 million records. I am okay with paginating my results and just getting the top 100.

Any guidance is appreciated.


Solution

  • Like any other DBMS, OrientDB requires an index on the property you're using as a filter. Try executing this:

    CREATE INDEX Child.MarketName ON Chile( market name ) UNIQUE
    

    And you will see your query should be <300ms now.

    For more information look at https://orientdb.com/docs/2.2/SQL-Create-Index.html.