Search code examples
linuxgreenplum

how to create external table from csv file with commas in quote field in greenplum?


I'm trying to create external table from csv like this:

CREATE EXTERNAL TABLE hctest.ex_nkp
(
a text,
b text,
c text,
d text,
e text,
f text,
g text,
h text
)
LOCATION ('gpfdist://192.168.56.111:10000/performnkp.csv')
FORMAT 'CSV' (DELIMITER ',' HEADER);

The csv is delimited by comma (,) and looks like this :

"Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
"90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
"90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
"90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers&#39","","
"90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"

And i found the error:

ERROR:  extra data after last expected column  (seg0 slice1 192.168.56.111:6000 pid=14121)
DETAIL:  External table ex_nkp, line 5 of file gpfdist://192.168.56.111:10000/performnkp.csv

How can i resolve this?


Solution

  • It looks like your CSV is malformed in line 4. Notice that at the end of line 4, there is a single quote, and Greenplum is interpreting that as a CSV field with a line break. By adding the missing quote on line 4, I am able to read the file in Greenplum.

    "Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
    "90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
    "90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
    "90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers&#39","","
    "90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"
    

    Resulting query:

    fguerrero=# select * from ex_nkp ;
    NOTICE:  HEADER means that each one of the data files has a header row
        a     |                           b                           |     c      |     d      |              e              |                                         f                                          |  g  | h
    ----------+-------------------------------------------------------+------------+------------+-----------------------------+------------------------------------------------------------------------------------+-----+---
     90008765 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 1. Uncompromising Integrity | <p>High ethical standards, low tolerance of unethical conduct.</p>                 | Yes | 3
     90008766 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 2. Team Synergy             | <p>Passionately work together, ensuring completeness, to achieve common goals.</p> | Yes | 3
     90008767 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 3. Simplicity               | <p>We do our utmost to deliver the easy to use solutions, exceeding customers&#39  |     |
     90008768 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 4. Exceptional Performance  | <p>Highest level of performance, with a heart for people.</p>                      | Yes | 3
    (4 rows)
    

    Let me know if this helps