Search code examples
sqloracleplsqloracle-sqldeveloper

Looping spool scripts


I have a big table in an Oracle database, which I would like to extract to CSV so I can move it onto another system (ideally there would be a direct connection between these two systems, but in the absence of that this is the next best solution).

Because the table is large, I'm extracting a month of data at the time. So the query looks like this:

set termout off;
set echo off;
set verify off;
spool "C:\output_path\&&1. events.csv";
select /*csv*/ * from my_database.events
where my_date between to_date(&&1,'yyyy-mm-dd') and to_date(&&2,'yyyy-mm-dd');
spool off
set termout on;
set echo on;

I then call this from another script, like so:

@my_script.sql "'2020-01-01'" "'2020-01-31'"

This would put "2020-01-01" and "2020-01-31" in place of &&1 and &&2 respectively, and output January's data to C:\output_path\'2020-01-01' events.csv.

Ideally I'd like to turn this into a loop, so I could have a function like this (pseudo-python):

def scrape_range_of_months(start_date, end_date)
    date_range = make_date_range(start_date, end_date)
    for date in date_range:
        scrape_month(date, end_of_month(date))

Where make_date_range() and end_of_month() are separate functions.

PL/SQL is a far more arcane language than Python though, so while I can find individual parts of solutions, it's not clear if it's even possible to run scripts on a loop like I'd like to.


Solution

  • @justin Cave is right, a full solution would use UTL_FILE to write the files directly from PL/SQL.

    In this example I have written some PL/SQL to generate the code to do a month by month extract. I change the "between" to ">=" and "<" because if there were a time component you would miss records for the last day of the month.

    DECLARE
        l_date    DATE := DATE '2020-01-01';
        c_spool   VARCHAR2( 512 ) := Q'[spool "C:\output_path\YMD.events.csv";
    select /*csv*/ * from my_database.events
    where my_date >= to_date('YMD','yyyy-mm-dd') and my_date < add_months(to_date('YMD','yyyy-mm-dd'),1);
    ]';
    BEGIN
        WHILE l_date < SYSDATE
        LOOP
            DBMS_OUTPUT.put_line( REPLACE( c_spool
                                         , 'YMD'
                                         , TO_CHAR( l_date, 'yyyy-mm-dd' ) ) );
            l_date   := ADD_MONTHS( l_date, 1 );
        END LOOP;
    END;
    

    example output:

    spool "C:\output_path\2020-02-01.events.csv";
    select /*csv*/ * from my_database.events
    where my_date >= to_date('2020-02-01','yyyy-mm-dd') and my_date < 
    add_months(to_date('2020-02-01','yyyy-mm-dd'),1);