Search code examples
postgresqlpivotpostgiscrosstabraster

Pixel values of raster records to be inserted in the table as columns


I have a table with following columns:

(ID, row_num, col_num, pix_centroid, pix_val1). 

I have more than 1000 records. I am inserting my data using:

insert into pixelbased (row_num, col_num, pix_centroid, pix_val)
select
    (ST_PixelAsPolygons(rast, 1)).x as X,
    (ST_PixelAsPolygons(rast, 1)).y as Y,
    (ST_Centroid((ST_PixelAsPolygons(rast, 1)).geom)) as geom,
    (ST_PixelAsPolygons(rast, 1)).val as pix_val1
from mytable 
where rid=1`

Now I am trying to insert all the other records as a column and _pix_val1_ column is important for me. All the other columns will remain the same. In the other word, I want the final table to have these columns:

(ID, row_num, col_num, pix_centroid, pix_val1, pix_val2, pix_val3, ....)

Is there a way to do it?


Solution

  • I would want to store this data as a bitmap in a bytea if possible. Here's how to take a series of byte values and turn it into a bytea:

    WITH bytes(b) AS (SELECT x % 256 FROM generate_series(1,53000) x)
    SELECT ('\x'||string_agg(lpad(to_hex(b),2,'0'),''))::bytea FROM bytes;
    

    You can access fields or ranges of the byte array using the substr function. This bytea is organized as a linear pixel array, but you may find it more useful to organize it into a more traditional bitmap format. Also, if your pixels are more than one byte you may need to cope with big-endian vs little-endian. You could do that in SQL, but it's likely to be much easier in a procedural language like PL/Perl.


    Failing that, a multidimensional array would be a somewhat reasonable choice.

    Using a generate_series statement as a substitute for your pix_val field for convenient testing, this query produces a two-dimensional array of integers using two aggregation passes:

    SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr
    FROM (
       SELECT array_agg(x order by x)::text 
       FROM generate_series(1,53000) x
       GROUP BY width_bucket(x, 1, 53001, 100)
    ) a(subarray);
    

    The unfortunate use of the string literal form of the two dimensional array is made necessary by the fact that array_agg cannot aggregate arrays. In my view this is a real wart in PostgreSQL; in general its multidimensional arrays are odd to work with and inconsistent with how most applications and languages implement arrays.

    You can get fields out of the array by indexing it. Example:

    regress=> SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr INTO test FROM (SELECT array_agg(x order by x)::text from generate_series(1,53000) x GROUP BY width_bucket(x, 1, 53001, 100)) a(subarray);
    
    regress=> \d test
    
          Table "public.test"
     Column |   Type    | Modifiers 
    --------+-----------+-----------
     arr   | integer[] | 
    

    test contains a single array with two dimensions:

    regress=> \x
    regress=> select array_dims(test.arr), array_ndims(test.arr), array_length(test.arr,1), array_length(test.arr,2) FROM test;
    -[ RECORD 1 ]+---------------
    array_dims   | [1:100][1:530]
    array_ndims  | 2
    array_length | 100
    array_length | 530
    

    I can get elements with two-level indexing:

    regress=> SELECT test.arr[4][4] FROM test;
     arr  
    ------
     1594
    (1 row)
    

    or a "column" with slicing:

    regress=> SELECT test.arr[4:4][1:530] FROM test;
    

    Oddly, this is still a two-dimensional array, the top dimension is just one element deep. You can flatten it (inefficiently) with unnest and array_agg if you need to.

    Two-dimensional arrays in PostgreSQL are somewhat weird, as you can see, but so is what you're trying to do.