My current task requires me to export an estimated 100,000 rows of data from a very large database.
I'm fairly new to handing big data and I would love to hear a few best practices and guidelines from people who have had prior experiences with these issues that worked for them in the past in an effort to make this post non-subjective.
A few more details:
The database is not at all normalized (very ugly)
I have to deal with at least 100,000 rows in total
The task is run at midnight with less users
Currently using ColdFusion 9, PostgreSQL 8.4
Thank you!
This is what my code looks like after applying Craig's solution:
<cfset base_path = GetDirectoryFromPath(ExpandPath("*.*")) & "some_parent\some_child\">
<cfif not DirectoryExists(base_path)>
<cfdirectory directory="#base_path#" action="create" mode="777">
</cfif>
<cfset this_batch_path = DateFormat(Now(), 'mmddyyyy') & TimeFormat(Now(), 'hhmmss') & "\">
<cfdirectory directory="#base_path##this_batch_path#" action="create" mode="777">
<cfset this_filename = "someprefix_" & DateFormat(Now(), 'yyyymmdd') & ".csv">
<cffile action="write" file="#base_path##this_batch_path##this_filename#" output="">
<cfset escaped_copy_path = ListChangeDelims(base_path & this_batch_path & this_filename, "\\", "\")>
<cfquery name="qMyQuery" datasource="some_db" username="some_uname" password="some_pword" result="something">
COPY some_table TO '#escaped_copy_path#' WITH CSV HEADER;
</cfquery>
Now I need to get the count of copied rows. In the PGSQL 8.4 docs:
Outputs
On successful completion, a COPY command returns a command tag of the form
COPY count
The count is the number of rows copied.
But I can't seem to get it to work, even with the result tag and the query itself.
100,000 rows isn't big, unless these rows are very very wide with lots of big values.
Just use psql
and \copy (SELECT ...) TO '/some/local/file' WITH (FORMAT CSV, HEADER)
If you want you can estimate the data size:
select pg_size_pretty(sum( octet_length(t::text) )) FROM mytable t WHERE ...;
For actually big data extract runs, sometimes you may want to use an ETL tool like Talend Studio, Pentaho Kettle or CloverETL.
BTW, it's time to start thinking about upgrading from 8.4 as it's now end-of-life.