Search code examples
oracle-databasecsvplsqllarge-files

Reading large csv files in PL/SQL


First question here on StackOverFlow.

So, I'm working on a new project and I need to read large CSV-like files from PL/SQL and insert the data into a table. These files are about 400k lines each, and I'm not sure about the best way to do this. My expertise with PL/SQL comes a little short for the task :D

I've seen people using UTL_FILE.FOPEN. Is there a way of reading a large file by chunks or a more appropiate tool to use? I could not find any useful post about this issue.

Thank you in advance!


Solution

  • You can load data files using an inline external table or APEX_DATA_PARSER. Those techniques are often better than creating external tables because you do not have to manage as many schema objects. And those techniques are often better than using a tool like sqlldr because then your program is not dependent on a specific external program, shell script, or operating system.

    Inline External Table

    Find the directory to store the file:

    select directory_path from all_directories where directory_name = 'DATA_PUMP_DIR';
    

    Create the file test.csv in that directory:

    Name,Salary
    Alice,100
    Bob,200
    

    Read from that file without creating an external table:

    select *
    from external
    (   
        (
            name varchar2(100),
            salary number
        )
        default directory data_pump_dir
        access parameters
        (
            records delimited by newline
            skip 1
            fields terminated by ','
        )
        location ('test.csv')
    );
    

    Results:

    NAME    SALARY
    -----   ------
    Alice   100
    Bob     200
    

    APEX_DATA_PARSER

    If you have APEX installed on your database, the package APEX_DATA_PARSER is a convenient way to query your files. Although this technique does require installing a PL/SQL object that will load the file into a blob. See this Oracle-Base article for the code behind the function FILE_TO_BLOB.

    select col001 name, col002 salary
    from table
    (
        apex_data_parser.parse
        (
            p_content   => file_to_blob('DATA_PUMP_DIR', 'test.csv'),
            p_file_name => 'test.csv'
        )
    )
    

    SQL or PL/SQL?

    Although you asked for a PL/SQL solution, the above solutions are mostly SQL only. But that's a good thing if you can even avoid creating PL/SQL objects.

    If you're going to use this code inside PL/SQL, it's trivial to put those select statements inside a loop like this:

    begin
        for lines in
        (
            select *
            from external
            (   
                (
                    name varchar2(100),
                    salary number
                )
                default directory data_pump_dir
                access parameters
                (
                    records delimited by newline
                    skip 1
                    fields terminated by ','
                )
                location ('test.csv')
            )
        ) loop
            --Do something with the results here.
            dbms_output.put_line('Name: '||lines.name||',Salary: '||lines.salary);
        end loop;
    end;
    /