Search code examples
pythonpostgresqlpsycopg2hstore

Return PostgreSQL hstore as OrderedDict using psycopg2 in Python


A PostgreSQL hstore maintains order between storage and retrieval. This allows one to define the order in which the keys/values within the hstore are stored and retrieved.

Unfortunately, psycopg2's implementation uses a hard-coded dict under extras.py -> HstoreAdapter.parse().

While in most scenarios parsing an hstore to a dict suites just fine, in our scenario this causes problems; we specifically need to maintain ordering.

One work-around I've derived is querying for keys and values as separate lists:

SELECT AKEYS(myhstore) AS keys, AVALS(mystoore) AS vals FROM mytbl

... However that solution treats the symptom of the problem rather than the cause.

Is anyone aware of a monkey-patch solution to this issue, or a branch of psycopg2 which deals with this issue, or of an implementation of the extras.py file which resolves this?

Alternately, does anyone have any other suggestions on how to handle this?


Solution

  • NOTICE: HSTORE does not preserve order. It is unordered like Python's dict. My previous answer only worked by chance.

    The order of the pairs is not significant (and may not be reproduced on output).

    Example:

    SELECT 'foo=>1,bar=>2,spam=>3,eggs=>4'::hstore
    UNION ALL
    SELECT hstore('{foo,1,bar,2,spam,3,eggs,4}'::TEXT[])
    UNION ALL
    SELECT hstore('{{foo,1},{bar,2},{spam,3},{eggs,4}}'::TEXT[])
    UNION ALL
    SELECT hstore('{foo,bar,spam,eggs}'::TEXT[], '{1,2,3,4}'::TEXT[])
    

    All result in:

    ""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""
    ""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""
    ""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""
    ""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""
    

    Which appears to be unordered because its order is different from the original order, and it is not alphabetically ordered.


    THE BELOW DOES NOT ACTUALLY WORK!

    You can maintain order in an HSTORE by using the hstore_to_matrix() function which converts the HSTORE to an array of key-value pairs. Then you have to manually pass it to OrderedDict in Python:

    import collections
    import psycopg2
    import psycopg2.extras
    
    pg = psycopg2.connect(...)
    psycopg2.extras.register_hstore(pg)
    cursor = pg.cursor()
    
    cursor.execute("""
        SELECT hstore_to_matrix('a=>1,b=>2,c=>3'::hstore);
    """)
    pairs = cursor.fetchone()[0]
    ordered = collections.OrderedDict(pairs)
    
    print(pairs)
    # [['a', '1'], ['b', '2'], ['c', '3']]
    print(ordered)
    # OrderedDict([('a', '1'), ('b', '2'), ('c', '3')])
    

    psycopg2 ultimately calls .keys() and .values() on the dictionary when its converted back to an HSTORE which means so long as the dictionary is ordered, the HSTORE sent back to PostgreSQL will also be ordered. You just have to pass back an OrderedDict instead of a regular dict to maintain order:

    # This will save the data as ordered.
    data = OrderedDict([('a', '1'), ('b', '2'), ('c', '3')])
    cursor.update("""
        UPDATE mytable
        SET ordered = %(data)s;
    """, {
        'data': data
    })
    
    # This will save the data as unordered. Whatever the internal order
    # happens to be for the dict will be sent. When I run it in the python
    # interpreter, it results in:
    #   [('a', '1'), ('c', '3'), ('b', '2')]
    cursor.update("""
        UPDATE mytable
        SET ordered = %(data)s;
    """, {
        'data': data
    })