First question here on StackOverFlow.
So, I'm working on a new project and I need to read large CSV-like files from PL/SQL and insert the data into a table. These files are about 400k lines each, and I'm not sure about the best way to do this. My expertise with PL/SQL comes a little short for the task :D
I've seen people using UTL_FILE.FOPEN. Is there a way of reading a large file by chunks or a more appropiate tool to use? I could not find any useful post about this issue.
Thank you in advance!
You can load data files using an inline external table or APEX_DATA_PARSER. Those techniques are often better than creating external tables because you do not have to manage as many schema objects. And those techniques are often better than using a tool like sqlldr because then your program is not dependent on a specific external program, shell script, or operating system.
Find the directory to store the file:
select directory_path from all_directories where directory_name = 'DATA_PUMP_DIR';
Create the file test.csv in that directory:
Name,Salary
Alice,100
Bob,200
Read from that file without creating an external table:
select *
from external
(
(
name varchar2(100),
salary number
)
default directory data_pump_dir
access parameters
(
records delimited by newline
skip 1
fields terminated by ','
)
location ('test.csv')
);
Results:
NAME SALARY
----- ------
Alice 100
Bob 200
If you have APEX installed on your database, the package APEX_DATA_PARSER
is a convenient way to query your files. Although this technique does require installing a PL/SQL object that will load the file into a blob. See this Oracle-Base article for the code behind the function FILE_TO_BLOB
.
select col001 name, col002 salary
from table
(
apex_data_parser.parse
(
p_content => file_to_blob('DATA_PUMP_DIR', 'test.csv'),
p_file_name => 'test.csv'
)
)
Although you asked for a PL/SQL solution, the above solutions are mostly SQL only. But that's a good thing if you can even avoid creating PL/SQL objects.
If you're going to use this code inside PL/SQL, it's trivial to put those select statements inside a loop like this:
begin
for lines in
(
select *
from external
(
(
name varchar2(100),
salary number
)
default directory data_pump_dir
access parameters
(
records delimited by newline
skip 1
fields terminated by ','
)
location ('test.csv')
)
) loop
--Do something with the results here.
dbms_output.put_line('Name: '||lines.name||',Salary: '||lines.salary);
end loop;
end;
/