Search code examples
performancepostgresqlsql-tuning

Merge two columns into one


I have a performance trouble with the merging of two columns into one for table size of 7m entries.

My main goal is:

  • merge open_time and close_time in one 'time' column
  • order time and account_id by DESC
  • check op_type column for each account_id based on ordered entries by time and account_id:
    • if op_type was changed from 0 to 1 or vice versa for first and second entries, make counter +1.
    • if op_type was changed from 0 to 1 or vice versa for second and third entries again, make counter +1 again.
    • and so on for all entries for each account_id

account_id, open_time and close_time are indexed.

Option #1: Merge open_time and close_time columns into one column 'time' using the union of two select statements:

select account_id, op_type, open_time as time, instrument_id
from tmp_operations
UNION
select account_id, op_type=0, close_time as time, instrument_id
from tmp_operations

Execution time of union two select statement is over 4 000 000 ms and still running.

Option #2: Merge open_time and close_time columns into one column 'time' using the unnest of array:

SELECT
    account_id,
    op_type,
    unnest(ARRAY[open_time, close_time]) as time,
    instrument_id
FROM risklive.operations_mt4 op

Execution time of unnesting array is about 315 000 ms which is more better. Thanks Gabriel's Messanger!

Some samples what I want to see as the result for timestamps merging:

      open_time               close_time                       time
"2015-08-19 09:18:24"    "2015-08-19 09:20:40"          "2015-08-19 09:18:24" 
"2015-08-19 09:11:54"    "2015-08-19 09:17:16"    -->   "2015-08-19 09:20:40"
"2015-08-19 09:17:46"    "2015-08-19 09:18:22"          "2015-08-19 09:11:54"
                                                        "2015-08-19 09:17:16"
                                                        "2015-08-19 09:17:16"
                                                        "2015-08-19 09:17:46"
                                                        "2015-08-19 09:18:22"

As for op_type column changes counter for each entries per account_id:

account_id   op_type         time
  63004;        1;    "2015-08-19 09:18:24"
  63004;        1;    "2015-08-19 09:20:40"
  63004;        1;    "2015-08-19 09:11:54"
  63004;        1;    "2015-08-19 09:17:16"   <-- op_type will be changed in next entry
  63004;        0;    "2015-08-19 09:17:46"   <-- counter = 1
  63004;        0;    "2015-08-19 09:18:22"   <-- op_type will be changed in next entry
  63004;        1;    "2015-08-19 09:09:31"   <-- counter = 2
  63004;        1;    "2015-08-19 09:09:31"
  63004;        1;    "2015-08-19 09:31:09"
  63004;        1;    "2015-08-19 09:32:07"   <-- op_type will be changed in next entry
  63004;        0;    "2015-08-19 09:32:09"   <-- counter = 3
  63004;        0;    "2015-08-19 09:57:44"   <-- op_type will be changed in next entry
  63004;        1;    "2015-08-19 09:20:43"   <-- counter = 4
  63004;        1;    "2015-08-19 09:31:51"
  63004;        1;    "2015-08-19 09:20:59"
  63004;        1;    "2015-08-19 09:31:51"

Above op_type changes counter I don't know how to implement at this moment.

How can I tune all of it?


Solution

  • Since you've changed your question a little but after my answer, I completly rewrite my post too.

    Disclaimer:

    You need to perform opperation requires merging, and order whole table (7M rows) it's everytime a bottle neck. It's possible that you won't find solution wchich satisfy you whitout changing your appraoch entirely. Nonetheless let me try.

    First problem:

    So your first problem was to "merge" two columns into one for whole table of 7M rows. You try UNION which needs two seq scan. As I propose before solution may be using array aggregation and unnest (wchich you did):

    SELECT
        account_id,
        op_type,
        unnest(ARRAY[open_time, close_time]) as time,
        instrument_id
    FROM risklive.operations_mt4 op
    

    Second problem:

    is counting op_type changes for account_id while ordered by "merge" time column. For readability I use CTE to put "merged table" in.

    We must use subquery. At one level we check for op_type changes with proper order (using lag() WINDOW FUNCTION which returns value one row before current row). At second level we sumarize number of op_type changes.

    WITH merged_table AS (
        SELECT
            account_id,
            op_type,
            unnest(ARRAY[open_time, close_time]) as time,
            instrument_id
        FROM risklive.operations_mt4 op
    )
    SELECT 
        account_id, SUM(abs(x)) as counter
    FROM (
        SELECT
             m.account_id,
             (m.op_type - lag(m.op_type)
                     OVER (PARTITION BY m.account_id ORDER BY time)
             ) as zero_if_no_op_type_change
        FROM merged_table m
    ) sub
    GROUP BY account_id
    

    Unfortunatly it may take too long for your needs. If so it's hard to do much more improvments in my oppinion.