Search code examples
phpmysqlunpackmysql-spatial

How to unpack MySQL Multipoint geometry data in PHP?


Okay, so I think I'm close here, but I'm hitting a limitation in my understanding of binary data.

I'm parsing some MySQL data inserted as geometry types, using PHP's unpack() as my parser, and everything was going swimmingly until I started trying to unpack complex geometry types (e.g. MULTIPOINT).

For a POINT data type, I've had good success using an unpack pattern that simply disregards the first chunk and then gives me an associative array of type, order, lat and lon:

$coords = unpack('x/x/x/x/corder/Ltype/dlat/dlon', $point);
// >>> [
//       'order' => 1,
//       'type' => 1,
//       'lat' => (expected value),
//       'lon' => (expected value)
//     ];

Naturally, applying the exact same pattern to MULTIPOINT geometries doesn't work the same. It gets the order and the type becomes 4, but the values of lat and lon are not at all what I'd expect. So, curious to see what the whole thing looked like, I changed the pattern to just spew it all out as "double (machine dependent size and representation)" types:

$coords = unpack('x/x/x/x/corder/Ltype/d*', $multipoint);
// >>> mayhem

What that unpacks actually includes five additional array items, instead of the 4 I'd expect for a multipoint with two points (2x2), and the values are totally wack. For instance, a value I'd expect to be somewhere in the 40-something-point-whatever range seems to read something like -1.0977282851114052e-218.

What is the proper way to unpack a MULTIPOINT? My instinct says I'm slicing the bytes where I shouldn't or casting them to an inappropriate type, but I'm not sure what those should be.


Solution

  • Took a bit of looking, but I found a reference for the WKB formats of various geometries. As I guessed by looking at the output of

    SELECT HEX(ST_GeomFromText('MULTIPOINT(1 1, 2 2, 3 3)'))
    

    there is a count of points after the order and type. The challenge is that the order and type are re-declared for every point and unpack doesn't have a concept of repeating a group of bytes. So you'll need to pull the bytes for each point and run unpack on them again. Obviously this will start to get more involved once you try accounting for more spatial types.

    <?php
    $multipoint_wkb = hex2bin("000000000104000000030000000101000000000000000000F03F000000000000F03F010100000000000000000000400000000000000040010100000000000000000008400000000000000840");
    
    function unpack_multipoint($multipoint)
    {
        $data = unpack("x4/corder/Ltype/Lcount", $multipoint);
        for ($i = 0; $i < $data["count"]; $i++) {
            // the header is 1+4+8 bytes and each point record is 1+4+8+8 bytes
            $offset = ($i * 21) + 13;
            $return[] = unpack("corder/Ltype/dlat/dlon", $multipoint, $offset);
        }
        return $return;
    }
    
    print_r(unpack_multipoint($multipoint_wkb));
    

    It's worth noting that those 4 NUL bytes are inserted by MySQL but are not part of the actual geometry object. I left them in place per your question, but if you retrieve the data using the ST_AsWKB function:

    SELECT HEX(ST_AsWKB(ST_GeomFromText('MULTIPOINT(1 1, 2 2, 3 3)')))
    

    the extra bytes are not prepended.