Search code examples
pythonsqlsqlalchemysubqueryin-subquery

How do I force SQLAlchemy ORM to include table in subquery?


I have two tables user and visitors with the following contents:

users:

+----+-------------+
| id |    email    |
+----+-------------+
|  1 | [email protected] |
|  2 | [email protected] |
+----+-------------+

visitors:

+----+---------+------+
| id | user_id | addr |
+----+---------+------+
|  1 |  NULL   |  1   |
|  2 |  NULL   |  2   |
|  3 |  NULL   |  1   |
|  4 |  NULL   |  2   |
|  5 |  NULL   |  3   |
|  6 |    1    |  4   |
|  7 |  NULL   |  5   |
|  8 |  NULL   |  6   |
|  9 |    2    |  2   |
+----+---------+------+

I want to get all ids of table visitors that are not contained in a list of addresses that are linked to a given pattern for an email stored in table user.

e.g.:

SELECT visitors.id
FROM visitors
WHERE visitors.addr NOT IN (
    SELECT DISTINCT visitors.addr
    FROM user, visitors
    WHERE user.email LIKE '%bar%' AND visitors.user_id = user.id
)

The inner select statement will return addr 2 so the outer statement will return the visitor ids for the 6 rows that don't have add == 2 (i.e. all except 2, 4 and 9). I tried doing that with this SQLalchemy statement:

subquery = (
  session.query(Visitors.addr).distinct()
  .filter(User.email.like('%bar%'), Visitors.user_id == User.id)
  .subquery()
)

mainquery = session.query(Visitors.id).filter(Visitors.addr.notin_(subquery))

This is the SQL statement that's created by SQLalchemys ORM:

SELECT visitors.id AS visitors_id
FROM visitors
WHERE visitors.addr NOT IN (SELECT DISTINCT visitors.addr
FROM user
WHERE user.email LIKE ? AND visitors.user_id = user.id)

The key distinction there is that the subquery doesn't contain visitors in the FROM clause anymore and this missing table means visitor.id 2 and 4 are returned even though they have addr 2 assigned. Is there a way to force SQLAlchemy to include a given table in the FROM clause or should I try and achieve the same results with a JOIN instead?

Edit: The SQLAlchemy version that was used was 1.3.20. I updated it to 1.4.23 since then to be able to use the scalar_subquery as suggested by @ian-wilson. The Databases used are SQLite 3.35.5 and as MySQL 5.7.34 respectively. Using the scalar_subquery function didn't fix my issue, using a join for the subquery did.


Solution

  • I would use a join here like you mentioned but also you should consider trying scalar_subquery (1.4) or as_scalar (<1.4) to produce the in_ clause.

        subquery = session.query(
            Visitors.addr.distinct()
        ).join(
            User,
            Visitors.user_id == User.id
        ).filter(
            User.email.like('%bar%')
        ).scalar_subquery()
    
        mainquery = session.query(Visitors.id).filter(Visitors.addr.notin_(subquery))
    

    https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.scalar_subquery