Search code examples
sqlpostgresqlvalidationdecimaldata-cleaning

Typo in decimal place + postgresql


I have a data set of students that calculates average scores (to the nearest decimal) as of a certain date.The data set is riddled with typos. A sample data is:

student_id  date   avg_test_score
ab_1        1/2/20 95..6
ab_2        1/2/20 60.7
ab_3        2/4/20 88..7
ab_4        2/4/20 98.7.

This may seem straightforward but I'm having a difficult time with the query in postgresql.

Thanks!


Solution

  • You could try the following regex update on the avg_test_score column:

    UPDATE yourTable
    SET avg_test_score = SUBSTRING(
        REGEXP_REPLACE(avg_test_score, '\.{2,}', '.') FROM '\d+(?:\.\d+)?');
    

    The above logic first replaces sequences of two or more dots with just a single dot. Then we extract integer or floating point numbers from what remains.