Search code examples
bashpostgresqlawkmonetdb

Replace an entire field value in a file using awk or other


I have an export from postgresql table with multiple fields, including boolean (exported by postgresql as t and f character), and I need to import it in another database (monetdb) that won't understand t/f as bool values.

(EDIT removed spaces to reflect true file aspect and avoid angry comments - previously there was spaces displayed)

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|
...

As I cannot replace all occurence of t/f I need to integrate the field separator in my pattern. I tried to use awk to replace fields t with TRUE and f with FALSE:

awk -F'|' '{gsub(/\|t\|/, "|TRUE|"); gsub(/\|f\|/, "|FALSE|"); print;}' 

This is working partially, as consecutive fields with a same value (|t|t|) will have only the first occurrence replaced (|TRUE|t| - as 2nd occurence is in fact t| and not |t|).

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|t|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|f|
...

Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

I could run the gsub() twice, but I was looking for more elegant way to match the entire field content for all fields.

gsub(/^t$/, ...) is not helping either as we are in the middle of a line most of the time.


Solution

  • Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

    You can let Postgres do the work for you. Basic query to produce the SELECT list:

    SELECT string_agg(CASE WHEN atttypid = 'bool'::regtype
                           THEN quote_ident(attname) || '::text'
                           ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
    FROM   pg_attribute
    WHERE  attrelid = 'mytable'::regclass  -- provide table name here
    AND    attnum > 0
    AND    NOT attisdropped;
    

    Produces a string of the form:

    col1, "CoL 2", bool1::text, "Bool 2"::text
    

    All identifiers are escaped properly. Columns are in default order. Copy and execute it. Use COPY to export to file. (Or \copy in psql.) Performance is about the same as exporting a plain table. If you don't need upper case omit upper().

    Why is a simple cast to text enough?

    About regclass and escaping identifiers properly:

    If you need a complete statement with TRUE / FALSE / NULL in upper case, standard SQL cast notation (without colons ::), still original column names and maybe a schema-qualified tablename:

    SELECT 'SELECT '
         || string_agg(CASE WHEN atttypid = 'bool'::regtype
                            THEN format('upper(cast(%1$I AS text)) AS %1$I', attname)
                            ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
         || ' FROM myschema.mytable;'           -- provide table name twice now
    FROM   pg_attribute
    WHERE  attrelid = 'myschema.mytable'::regclass
    AND    attnum > 0
    AND    NOT attisdropped;
    

    Produces a complete statement of the form:

    SELECT col1, "CoL 2", upper(cast(bool1 AS text) AS bool1, upper(cast("Bool 2" AS text)) AS "Bool 2" FROM myschema.mytable;