The maximum size of limited character types (e.g., varchar(n)) in PostgreSQL is 10485760 (exactly 10 MiB). See description of the maximum length of PostgreSQL's varchar.
Please download the file for testing and extract it in /tmp/2019q4
. We only use pre.txt
to import data with.
Enter your PostgreSQL client (psql
) and create a database:
create database edgar;
\c edgar;
Create table according to the webpage:
Fields in pre table definitions
create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel varchar(512),
negating boolean
);
Output:
CREATE TABLE
Try to import data:
\copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
We analyse the error information:
ERROR: value too long for type character varying(512)
CONTEXT: COPY pre, line 1005798, column plabel: "LIABILITIES AND STOCKHOLDERS EQUITY 0
0001493152-19-017173 2 11 BS 0 H LiabilitiesAndStockholdersEqu..."
Time: 1481.566 ms (00:01.482)
The size I set in the field is just 512, much less than 10485760.
The content in line 1005798 is not the same as in the error information:
0001654954-19-012748 6 20 EQ 0 H ReclassificationAdjustmentRelatingToAvailableforsaleSecuritiesNetOfTaxEffect 0001654954-19-012748 Reclassification adjustment relating to available-for-sale securities, net of tax effect" 0
Now I drop the previous table, convert the plabel field as text
, and recreate it:
drop table pre;
Output:
DROP TABLE
Time: 22.763 ms
And:
create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel text,
negating boolean
);
Output:
CREATE TABLE
Time: 81.895 ms
Import the same data with the same copy command:
\copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
Output:
COPY 275079
Time: 2964.898 ms (00:02.965)
There isn't any error information in the psql
console. Let me check the raw data '/tmp/2019q4/pre.txt', which it contain 1043000
lines.
wc -l /tmp/2019q4/pre.txt
1043000 /tmp/2019q4/pre.txt
There are 1043000 lines. How many lines imported then?
select count(*) from pre;
Output:
count
--------
275079
(1 row)
Why is there so less data imported without error information?
The sample data you provided is obviously not the data you are really loading. It does still show the same error, but of course the line numbers and markers are different.
That file occasionally has double quote marks where there should be single quote marks (apostrophes). Because you are using CSV mode, these stray double quotes will start multi-line strings, which span all the way until the next stray double quote mark. That is why you have fewer rows of data than lines of input, because some of the data values are giant multiline strings.
Since your data clearly isn't CSV, you probably shouldn't be using \copy in CSV format. It loads fine in text format as long as you specify "header", although that option didn't become available in text format until v15. For versions before that, you could manually remove the header line, or use PROGRAM to skip the header like FROM PROGRAM 'tail +2 /tmp/pre.txt'
Alternatively, you could keep using CSV format, but choose a different quote character, one that never shows up in your data such as with (delimiter E'\t', format csv, header, quote E'\b')