Search code examples
postgresqlsybasepostgresql-copy

How to convert hex characters when using Postgres COPY FROM?


I am importing data from a file to PostgreSQL database table using COPY FROM. Some of the strings in my file contain hex characters (mostly \x0d and \x0a) and I'd like them to be converted into regular text using COPY.
My problem is that they are treated as regular text and remain in the string unchanged.
How can I get the hex values converted?

Here is a simplified example of my situation:

-- The table I am importing to
CREATE TABLE my_pg_table (
    id serial NOT NULL,
    value text
);

COPY my_pg_table(id, data)
FROM 'location/data.file'
WITH CSV
DELIMITER '    ' -- this is actually a tab
QUOTE ''''
ENCODING 'UTF-8'

Example file:

1    'some data'  
2    'some more data \x0d'  
3    'even more data \x0d\x0a'

Note: the file is tab delimited.

Now, doing:

SELECT * FROM my_pg_table

would get me results containing hex.

Additional info for context:

My task is to export data from sybase tables (many hundreds) and import to Postgres. I am using UNLOAD to export data to files like so:

UNLOAD
TABLE my_sybase_table
TO 'location/data.file'
DELIMITED BY '    ' -- this is actually a tab
BYTE ORDER MARK OFF
ENCODING 'UTF-8'

Solution

  • It seems to me that (for a reason I don't understand) hex is only converted when using FORMAT TEXT and FORMAT CSV will treat it as regular string.

    Solving the problem in my situation:
    Because I had to use TEXT I didn't have the QUOTE option anymore and because of that I couldn't have quoted strings in my files anymore. So I needed my files in a little different format and eventually used this to export my table from sybase:

    UNLOAD
        SELECT
        COALESCE(cast(id as long varchar), '(NULL)'),
        COALESCE(cast(data as long varchar), '(NULL)')
        FROM my_sybase_table
    TO 'location/data.file'
    DELIMITED BY '    ' -- still tab delimited
    BYTE ORDER MARK OFF
    QUOTES OFF
    ENCODING 'UTF-8'
    

    and to import it to postgres:

    COPY my_pg_table(id, data)
    FROM 'location/data.file'
    DELIMITER '    ' -- tab delimited
    NULL '(NULL)'
    ENCODING 'UTF-8'
    

    I used (NULL), because I needed a way to differentiate between an empty string and null. I casted every column to long varchar, to make my mass export/import more convenient.

    I'd be still very interested to know why hex wouldn't convert when using FORMAT CSV.