I want to filter, sort and dump data from a big table to a csv. Oracle 12c.
unload
does filter but does not sort, but I want a stable sorting as I need to compare the data with files from other sources. Also it does not chunk output files. Of course if someone knows how to do this in unload
please tell.dp
is the same as unload
right?expdp
does not exist in the db; not installed?DBMS_PARALLEL_EXECUTE
package but I cannot find how to. The reason is simple: I want to chunk the rows by page, not by any column.The main query I want to execute is:
select FIELD1, FIELD2
from DATA
where COMPANY_ID like :regex
order by FIELD1 -- indexed and unique
offset :start_id rows fetch next :page_size rows only;
I only want 2 column.
I think I cannot use CREATE_CHUNKS_BY_NUMBER_COL
or CREATE_CHUNKS_BY_ROWID
as
This procedure chunks the table (associated with the specified task) by the specified column. The specified column must be a NUMBER column. This procedure takes the MIN and MAX value of the column, and then divides the range evenly according to chunk_size.
First question:
Can I use PK? The PK is NUMBER
, and continuous; but filtering by COMPANY_ID
will make it not continuous. So for example from ID 0 to 500 I only need 2 rows, that breaks the job right? Will it fetch 2 for me or 500?
Second question: I am following https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2#user_defined_framework but I don't know when the chunk is executed and scheduled.
My code:
create or replace procedure compare (
dir varchar2,
regex varchar2,
start_id integer,
end_id integer,
page_size integer
) as
CURSOR c_query is
select FIELD1, FIELD2
from DATA
where COMPANY_ID like regex
order by FIELD1
offset start_id rows fetch next page_size rows only;
output utl_file.file_type;
filename varchar2(100);
begin
filename := 'split_' || start_id || '_' || end_id || '.csv';
output := utl_file.fopen ( dir, filename, 'w' );
utl_file.put_line ( output, 'FIELD1,FIELD2' );
for row in c_query loop
utl_file.put_line ( output, row.FIELD1 || ',' || row.FIELD2 );
end loop;
utl_file.fclose ( output );
end compare;
/
create or replace procedure loop_comparison(
l_task VARCHAR2 := 'test_task',
regex VARCHAR2 := '90%',
page_size NUMBER := 50,
dir varchar2 := 'DATA_PUMP_DIR'
)
as
l_sql_stmt VARCHAR2(32767);
l_sql_step_stmt VARCHAR2(32767);
row_count NUMBER;
p_start NUMBER := 0;
p_end NUMBER := 0;
BEGIN
DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
select count(ID) into row_count from DATA where COMPANY_ID like regex;
p_end := page_size;
l_sql_stmt := 'BEGIN compare(''' || dir || ''',''' || regex || ''', :start_id, :end_id, ' || page_size || '); END;';
WHILE p_start < row_count
LOOP
l_sql_step_stmt := 'select ' || p_start || ' as start_id, ' || p_end || ' as end_id from dual';
DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,
sql_stmt => l_sql_step_stmt,
by_rowid => FALSE);
DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,
sql_stmt => l_sql_stmt,
language_flag => DBMS_SQL.NATIVE,
parallel_level => 20,
job_class => 'test_task_class');
DBMS_OUTPUT.put_line(
TO_CHAR(SYSTIMESTAMP, 'yyyy-mm-dd hh24:mi:ss.ff')
|| ' '
|| DBMS_PARALLEL_EXECUTE.task_status(l_task)
);
p_start := p_start + page_size;
p_end := p_end + page_size;
END LOOP;
end loop_comparison;
/
After exec loop_comparison
, I find this:
SQL> select task_name, chunk_type, status from USER_PARALLEL_EXECUTE_TASKS;
TASK_NAME CHUNK_TYPE STATUS
____________ _____________ __________
test_task UNDECLARED CREATED
SQL> select * from USER_PARALLEL_EXECUTE_CHUNKS;
no rows selected
I suppose it's never running.
If you are doing an ORDER BY, then you will be passing over the entire set of rows that match you criteria anyway, so its unlikely to see any massive benefit from splitting this up, unless the dominant cost is the sorting element.
If that is the case, I'd tackle that, eg, bumping up PGA or having faster TEMP tablespace IO.
'dp' is datapump, so not appropriate for csv.
Because you are sorting by FIELD1, then if you know some upper/lower limits for FIELD1, then you could use that for chunking/sorting because by definition each chunk will have mutually exclusive data. You'd probably end up with differing file sizes.
If you want to use DBMS_PARALLEL_EXECUTE for chunked strings, you can have a lookup table, eg
ID LO HI
-- --- ---
1 AAA FZZ
2 GAA QZZ
3 RAA ZZZ
and then create three chunks with CREATE_CHUNK_BY_SQL to load values 1-1, 2-2, 3-3 as tasks.
Then the SQL or PLSQL that does your work (ie, the routine that DBMS_PARALLEL_EXECUTE will schedule) will do
select ...
from my_table, lookup_table
where field1 >= lo and field1 < hi
and lookup_id = :startid
order by field1