I have two tables user and visitors with the following contents:
users:
+----+-------------+
| id | email |
+----+-------------+
| 1 | [email protected] |
| 2 | [email protected] |
+----+-------------+
visitors:
+----+---------+------+
| id | user_id | addr |
+----+---------+------+
| 1 | NULL | 1 |
| 2 | NULL | 2 |
| 3 | NULL | 1 |
| 4 | NULL | 2 |
| 5 | NULL | 3 |
| 6 | 1 | 4 |
| 7 | NULL | 5 |
| 8 | NULL | 6 |
| 9 | 2 | 2 |
+----+---------+------+
I want to get all ids of table visitors that are not contained in a list of addresses that are linked to a given pattern for an email stored in table user.
e.g.:
SELECT visitors.id
FROM visitors
WHERE visitors.addr NOT IN (
SELECT DISTINCT visitors.addr
FROM user, visitors
WHERE user.email LIKE '%bar%' AND visitors.user_id = user.id
)
The inner select statement will return addr 2 so the outer statement will return the visitor ids for the 6 rows that don't have add == 2 (i.e. all except 2, 4 and 9). I tried doing that with this SQLalchemy statement:
subquery = (
session.query(Visitors.addr).distinct()
.filter(User.email.like('%bar%'), Visitors.user_id == User.id)
.subquery()
)
mainquery = session.query(Visitors.id).filter(Visitors.addr.notin_(subquery))
This is the SQL statement that's created by SQLalchemys ORM:
SELECT visitors.id AS visitors_id
FROM visitors
WHERE visitors.addr NOT IN (SELECT DISTINCT visitors.addr
FROM user
WHERE user.email LIKE ? AND visitors.user_id = user.id)
The key distinction there is that the subquery doesn't contain visitors in the FROM clause anymore and this missing table means visitor.id 2 and 4 are returned even though they have addr 2 assigned. Is there a way to force SQLAlchemy to include a given table in the FROM clause or should I try and achieve the same results with a JOIN instead?
Edit: The SQLAlchemy version that was used was 1.3.20. I updated it to 1.4.23 since then to be able to use the scalar_subquery as suggested by @ian-wilson. The Databases used are SQLite 3.35.5 and as MySQL 5.7.34 respectively. Using the scalar_subquery function didn't fix my issue, using a join for the subquery did.
I would use a join here like you mentioned but also you should consider trying scalar_subquery
(1.4) or as_scalar
(<1.4) to produce the in_
clause.
subquery = session.query(
Visitors.addr.distinct()
).join(
User,
Visitors.user_id == User.id
).filter(
User.email.like('%bar%')
).scalar_subquery()
mainquery = session.query(Visitors.id).filter(Visitors.addr.notin_(subquery))
https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.scalar_subquery