Search code examples
sqlpostgresqltypessql-updatedata-analysis

Data preparation - replace numbers and symbols in most columns of a table


I am currently interested in data preparation but I have some troubles with the SQL script I am working on.

I found a .csv file with statistics about crime rates as you can see here:

Screenshot of the .csv file

Now I want to do some changes in the columns from 2016 to 2008. All numbers should be changed to 1 and all colons to 0 for further preparation and the data analyses process. I decided to use VARCHAR as datatype for the columns. Because of this I am able to change the colons to 0, but now I am unable to change the numbers to 1. This is the script at the moment:

DROP TABLE IF EXISTS crime;
CREATE TABLE crime (
id SERIAL,
age VARCHAR,
sex CHAR(1), 
unit VARCHAR,
geo VARCHAR,
"2016" VARCHAR, 
"2015" VARCHAR,
"2014" VARCHAR, 
"2013" VARCHAR,
"2012" VARCHAR, 
"2011" VARCHAR,
"2010" VARCHAR, 
"2009" VARCHAR,
"2008" VARCHAR,
PRIMARY KEY (id)
);

\unset ON_ERROR_STOP

COPY crime(age, sex, unit, geo, "2016", "2015", "2014", "2013", "2012", "2011", "2010", "2009", "2008") 
FROM 'path' 
delimiter ';' CSV HEADER;

ALTER TABLE crime DROP COLUMN IF EXISTS unit, DROP COLUMN IF EXISTS age;

UPDATE crime SET "2016" = REPLACE("2016", ':', '0'),
"2015" = REPLACE("2015", ':', '0'), 
"2014" = REPLACE("2014", ':', '0'), 
"2013" = REPLACE("2013", ':', '0'), 
"2012" = REPLACE("2012", ':', '0'), 
"2011" = REPLACE("2011", ':', '0'), 
"2010" = REPLACE("2010", ':', '0'), 
"2009" = REPLACE("2009", ':', '0'), 
"2008" = REPLACE("2008", ':', '0');

I hope someone can help me.


Solution

  • If it's all just numbers or : as indicated by your screenshot:

    UPDATE crime
    SET    "2016" = CASE WHEN "2016" = ':' THEN 0 ELSE 1 END
         , "2015" = CASE WHEN "2015" = ':' THEN 0 ELSE 1 END
      -- , ...
         , "2008" = CASE WHEN "2008" = ':' THEN 0 ELSE 1 END;
    

    If all you care about is 0 and 1, consider converting the result to boolean afterwards:

    ALTER TABLE crime 
      ALTER "2016" TYPE bool USING "2016"::bool
    , ALTER "2015" TYPE bool USING "2015"::bool
    -- , ...
    , ALTER "2008" TYPE bool USING "2008"::bool;
    

    0 -> FALSE
    1 -> TRUE

    And rather use legal identifiers that don't require double-quoting: c2016, c2015, ...