Search code examples
pythonsqlpostgresqlpsycopg2

How to omit certain parts of a SQL query


Disclaimer that I'm highly aware there's a better way to word the question, if someone wants to suggest a better way I'd be happy to change it

I've been working with DynamoDB for the past 4 years and now I'm switching one of my tables over to postgreSQL. I've been working with the psycopg2 python library to test some queries. Users will need to filter against the database and so it'd be nice to have an all in one filter query.

I'd like users to be able to select multiple values for a given filter or none, in the case of none, that field shouldn't be filtered against. Here's what a basic query might look like (this is just an example that will accomplish asking my question).

conn = psycopg2.connect(host=ENDPOINT, port=PORT, database=DBNAME, user=USER, password=PASSWORD, sslrootcert="SSLCERTIFICATE")
cur = conn.cursor()
sql = """
    SELECT * FROM table_name 
    WHERE column_1 in %s AND column_2 in %s
    ORDER BY datetime DESC
"""
sql_values = (("XXXXYXY", "XXXYYXXY"), ("ZGZGZGZGGG","GZGGGGZGG"))
cur.execute(sql, sql_values)

And here's the sort of query in which no value is present for column_2:

conn = psycopg2.connect(host=ENDPOINT, port=PORT, database=DBNAME, user=USER, password=PASSWORD, sslrootcert="SSLCERTIFICATE")
cur = conn.cursor()
sql = """
    SELECT * FROM table_name 
    WHERE column_1 in %s AND column_2 in %s
    ORDER BY datetime DESC
"""
sql_values = (("XXXXYXY", "XXXYYXXY"), ())
cur.execute(sql, sql_values)

Obviously, this wouldn't work. In short, I'd like it to only query against columns that have data present. What would be the most efficient way to accomplish this?


Solution

  • This demonstrates building a query dynamically:

    conn = psycopg2.connect(host=ENDPOINT, port=PORT, database=DBNAME, user=USER, password=PASSWORD, sslrootcert="SSLCERTIFICATE")
    cur = conn.cursor()
    
    where = []
    data = []
    
    for q,r in (
        ("column_1", ("XXXXYXY","XXXYYXXY")),
        ("column_2", ("XXXZZXY","XXZZYXXY")),
        ("column_3", ())
        ):
        if r:
            where.append( "%s in %%s" % q )
            data.append(r)
    
    sql = "SELECT * FROM table_name WHERE "
    sql += " AND ".join(where)
    sql += " ORDER BY datetime DESC;";
    print(sql)
    cur.execute(sql, data)
    

    Output:

    SELECT * FROM table_name WHERE column_1 in %s AND column_2 in %s ORDER BY datetime DESC;