Search code examples
databaseperformancemonetdb

How to improve query performance on a single node in MonetDB?


I've installed the latest (MonetDB 5 server v11v.27.5 "Jul2017-SP1") on Windows 2012 Server and I'm trying to query large table 1,4 billion rows in a reasonable time 2-3s.

Is this even possible with MonetDB? What could I do to improve the performance?

Detailed description of what I've done so far:

  1. Created table:

    CREATE TABLE t939ba ( id INT, xa INT, xb INT, ya INT, yb INT, a1 TINYINT, a2 TINYINT, a3 TINYINT, a4 TINYINT, a5 TINYINT, a6 TINYINT, a7 TINYINT, a8 TINYINT, a9 TINYINT);
    
  2. Loaded the data:

    COPY 1450000000 OFFSET 2 RECORDS INTO tbl FROM 'D:\\es_export\\file.csv'
    USING DELIMITERS ',' NULL AS '' LOCKED;
    
  3. Run the query:

    SELECT COUNT(DISTINCT id) FROM tbl WHERE a1=22
    AND xb>=143455 AND yb>=90911 AND xa<=143615 AND ya<=91007
    AND a2 IN (2, 3, 4) AND a3 IN (0, 1, 2, 3, 4) AND a4 IN (0, 1, 2)
    AND a5 IN (-1, 1, 2, 3, 4, 5, 6, 7) AND a6 IN (-1, 11, 12, 13, 14);
    

When I run the query for the 1st time it took (14m 52s), 2nd run of the same query took (3m 23s), 3rd consecutive run of the same query took (14s) and a slightly rearranged query took (3m 11s).


Solution

  • Tomas,

    thanks for the plans and traces. I see that you used the revised query with range predicates instead of IN predicates, and that this query now runs in "a mere" ~39 s (compared to ~15 min) --- either because the range predicates are evaluated more efficiently than the IN predicates, or because, as Martin indicated, a later run of the query benefits from indexes that where automatically built by MonetDB when evaluating the first query, or because of both.

    In any case, running a/each query (-version) more than once is a good idea to see the possible effect of automatically built indexes.

    Further, I see that either you indeed have a 34-core machine, or your machine has "only" 2 GB RAM per core --- not too much given that you have a ~42 GB data set, where each column is ~1.5 GB to ~6 GB in size ...

    Hence, the main reason for the query not running faster than ~39 s might be I/O activity due to "lack" of memory.

    Best,

    Stefan

    ps:
    You can check whether for this specific query, reducing (or even avoiding) multi-core parallelism helps to reduce I/O thrashing:
    Try running your query after disabling MonetDB's "mitosis" optimizer using

    set optimizer='no_mitosis_pipe';
    

    You can re-enable full multi-core parallelism using

    set optimzer='default_pipe';
    

    Best,
    Stefan