I have a performance trouble with the merging of two columns into one for table size of 7m entries.
My main goal is:
account_id, open_time and close_time are indexed.
Option #1: Merge open_time and close_time columns into one column 'time' using the union of two select statements:
select account_id, op_type, open_time as time, instrument_id
from tmp_operations
UNION
select account_id, op_type=0, close_time as time, instrument_id
from tmp_operations
Execution time of union two select statement is over 4 000 000 ms and still running.
Option #2: Merge open_time and close_time columns into one column 'time' using the unnest of array:
SELECT
account_id,
op_type,
unnest(ARRAY[open_time, close_time]) as time,
instrument_id
FROM risklive.operations_mt4 op
Execution time of unnesting array is about 315 000 ms which is more better. Thanks Gabriel's Messanger!
Some samples what I want to see as the result for timestamps merging:
open_time close_time time
"2015-08-19 09:18:24" "2015-08-19 09:20:40" "2015-08-19 09:18:24"
"2015-08-19 09:11:54" "2015-08-19 09:17:16" --> "2015-08-19 09:20:40"
"2015-08-19 09:17:46" "2015-08-19 09:18:22" "2015-08-19 09:11:54"
"2015-08-19 09:17:16"
"2015-08-19 09:17:16"
"2015-08-19 09:17:46"
"2015-08-19 09:18:22"
As for op_type column changes counter for each entries per account_id:
account_id op_type time
63004; 1; "2015-08-19 09:18:24"
63004; 1; "2015-08-19 09:20:40"
63004; 1; "2015-08-19 09:11:54"
63004; 1; "2015-08-19 09:17:16" <-- op_type will be changed in next entry
63004; 0; "2015-08-19 09:17:46" <-- counter = 1
63004; 0; "2015-08-19 09:18:22" <-- op_type will be changed in next entry
63004; 1; "2015-08-19 09:09:31" <-- counter = 2
63004; 1; "2015-08-19 09:09:31"
63004; 1; "2015-08-19 09:31:09"
63004; 1; "2015-08-19 09:32:07" <-- op_type will be changed in next entry
63004; 0; "2015-08-19 09:32:09" <-- counter = 3
63004; 0; "2015-08-19 09:57:44" <-- op_type will be changed in next entry
63004; 1; "2015-08-19 09:20:43" <-- counter = 4
63004; 1; "2015-08-19 09:31:51"
63004; 1; "2015-08-19 09:20:59"
63004; 1; "2015-08-19 09:31:51"
Above op_type changes counter I don't know how to implement at this moment.
How can I tune all of it?
Since you've changed your question a little but after my answer, I completly rewrite my post too.
You need to perform opperation requires merging, and order whole table (7M rows) it's everytime a bottle neck. It's possible that you won't find solution wchich satisfy you whitout changing your appraoch entirely. Nonetheless let me try.
So your first problem was to "merge" two columns into one for whole table of 7M rows. You try UNION
which needs two seq scan
. As I propose before solution may be using array aggregation and unnest
(wchich you did):
SELECT
account_id,
op_type,
unnest(ARRAY[open_time, close_time]) as time,
instrument_id
FROM risklive.operations_mt4 op
is counting op_type
changes for account_id
while ordered by "merge" time column. For readability I use CTE
to put "merged table" in.
We must use subquery. At one level we check for op_type
changes with proper order (using lag()
WINDOW FUNCTION
which returns value one row before current row). At second level we sumarize number of op_type
changes.
WITH merged_table AS (
SELECT
account_id,
op_type,
unnest(ARRAY[open_time, close_time]) as time,
instrument_id
FROM risklive.operations_mt4 op
)
SELECT
account_id, SUM(abs(x)) as counter
FROM (
SELECT
m.account_id,
(m.op_type - lag(m.op_type)
OVER (PARTITION BY m.account_id ORDER BY time)
) as zero_if_no_op_type_change
FROM merged_table m
) sub
GROUP BY account_id
Unfortunatly it may take too long for your needs. If so it's hard to do much more improvments in my oppinion.