Search code examples
sqlpostgresqlsqldatatypes

How to determine properly data types when I have number in quotes or mixed data?


I know that this is very simple questions but I can't go any further. I want to import data from csv file to PostgreSQL. I have made a table, name column as they are named on the file and first problem that I have got is that I don't know the data type. I mean in first column when i open CSV file i have something like that:

"COLUMN1";"COLUMN2";"COLUMN3";"COLUMN4"
"009910";NA;NA;"FALSE"
"953308";0;41;"TRUE"
"936540";NA;NA;"FALSE"
"902346";1;5;"TRUE"
"747665";NA;NA;"FALSE"
"074554";NA;NA;"FALSE"
"154572";NA;NA;"FALSE"

And when I am import this base via pgAdmin 4 its return error with datatype. I set column2 as Integer but it's kinda 'mixed'. The column 1 I also set as integer but numbers are in quote so I wonder if PostgreSQL see it as string. The same is up to column4. How should I properly determine data types of each column?


Solution

  • During import it will cast the value to the column's type, if possible. For example, if you do SELECT 'FALSE'::boolean it will cast and return false. SELECT '074554'::int works as well and returns 74554.

    But the bare characters NA will give you problems. If those are intended to be null, try to do a find/replace on the file and just take them out, so that the first row of data has "009910";;;"FALSE" and see if that works.

    You could also have all columns as text, quote the NA values, and import.

    Then create a new table, and use INSERT INTO ... SELECT from the all-text table and manually cast or use CASE as needed to convert types.

    For example, if you imported into a table called raw_data, and have a nicer table imports:

    INSERT INTO imports
    SELECT
      column1::int,
      CASE WHEN column2 = 'NA' THEN null ELSE column2::int END,
      CASE WHEN column3 = 'NA' THEN null ELSE column3::int END,
      column4::boolean
    FROM
      raw_data