I am working on a Flask+SQLAlchemy project that tracks API Testing results.
I've set up a 3 table, 1:M:M relationship in sql alchemy to track the test results for API tests. I am tracking the overall API Endpoint
, the functionality if that API (TestCase
) and the individual TestResult
's.
The models are set up like so:
class Endpoint(Base):
__tablename__ = "endpoint"
# Primary Key
id: Mapped[int] = mapped_column(primary_key=True)
# Data
api_route: Mapped[str] = mapped_column(String(300), unique=True, nullable=False)
# Relationships
## Links TestCase.endpoint
test_cases: Mapped[List['TestCase']] = relationship(back_populates="endpoint")
@property
def serialize(self):
return {
'id': self.id,
'api_route': self.api_route,
'test_cases': [i.serialize for i in self.test_cases],
'health': self.health
}
class TestCase(Base):
__tablename__ = "test_case"
# Primary Key
id: Mapped[int] = mapped_column(primary_key=True)
# Foreign Keys
endpoint_fk: Mapped[int] = mapped_column(ForeignKey("endpoint.id"))
# Relationships
## Links Endpoint.test_cases
endpoint: Mapped['Endpoint'] = relationship(back_populates="test_cases")
## Links TestResult.test_case
results: Mapped[List['TestResult']] = relationship(back_populates="test_case")
# Data
test_case: Mapped[str] = mapped_column(String(50))
@property
def serialize(self):
return {
'id': self.id,
'endpoint_fk': self.endpoint_fk,
'test_case': self.test_case,
'results': [i.serialize for i in self.results],
'health': self.health,
}
class TestResult(Base):
__tablename__ = "testresult"
# Primary Key
id: Mapped[int] = mapped_column(primary_key=True)
# Foreign Keys
test_case_fk: Mapped[int] = mapped_column(ForeignKey("test_case.id"))
# Data
result: Mapped[int] = mapped_column(Integer())
timestamp: Mapped[DateTime] = mapped_column(DateTime(timezone=True), server_default=func.now())
error: Mapped[str] = mapped_column(String(3000))
# Relationships
# Links TestCase.results
test_case: Mapped['TestCase'] = relationship(back_populates="results")
@property
def serialize(self):
return {
'id': self.id,
'test_case_fk': self.test_case_fk,
'result': self.result,
'timestamp': self.timestamp,
'error': self.error
}
I am now working on a dashboard that shows the last 24 hours of results, and it seems rather straightforward to just filter my query based on a start and end date. Im querying the tables like so to return the Endpoint
objects that i then serialize
to get all the child data via the defined relationships. I'd prefer to have my Endpoint objects returned since there's some additional properties on there that are calculated based on the child data without having to deal with aggregates in the query.
query = db.session.query(Endpoint) \
.join(TestCase, Endpoint.id == TestCase.endpoint_fk) \
.join(TestResult, TestCase.id == TestResult.test_case_fk) \
.filter(TestResult.timestamp.between(one_day_ago, dt_now)) \
.order_by(Endpoint.id)
logger.info(query)
res = [r.serialize for r in query.all()]
When executed, it's returning all of the data for all test results, instead of those just limited to being between one_day_ago
and dt_now
.
Honestly im a bit lost as to why im getting all this extra data, with TestResult
records belonging outside the specified date range.
I tried just doing some raw SQL and something like this work and properly gives back the proper amount of rows.
SELECT *
FROM endpoint JOIN test_case ON endpoint.id = test_case.endpoint_fk JOIN testresult ON test_case.id = testresult.test_case_fk
WHERE testresult.timestamp BETWEEN '2023-08-28T00:00:00.112993-04:00' AND '2023-08-29T23:59:59.112993-04:00'
ORDER BY endpoint.id
I turned on the database echoing and it looks like the lazy loading is doing what it's supposed to yet the original WHERE clause isn't being preserved when it's getting the child records.
2023-08-29 13:27:26,992 INFO sqlalchemy.engine.Engine
SELECT endpoint.id AS endpoint_id, endpoint.api_route AS endpoint_api_route
FROM endpoint JOIN test_case ON endpoint.id = test_case.endpoint_fk JOIN testresult ON test_case.id = testresult.test_case_fk
WHERE testresult.timestamp BETWEEN %(timestamp_1)s AND %(timestamp_2)s ORDER BY endpoint.id
2023-08-29 13:27:26,992 INFO sqlalchemy.engine.Engine [generated in 0.00015s] {'timestamp_1': datetime.datetime(2023, 8, 28, 13, 27, 26, 978617), 'timestamp_2': datetime.datetime(2023, 8, 29, 13, 27, 26, 978617)}
2023-08-29 13:27:26,998 INFO sqlalchemy.engine.Engine
SELECT test_case.endpoint_fk AS test_case_endpoint_fk, test_case.id AS test_case_id, test_case.test_case AS test_case_test_case
FROM test_case
WHERE test_case.endpoint_fk IN (%(primary_keys_1)s, %(primary_keys_2)s)
2023-08-29 13:27:26,998 INFO sqlalchemy.engine.Engine [generated in 0.00028s] {'primary_keys_1': 1, 'primary_keys_2': 2}
2023-08-29 13:27:27,000 INFO sqlalchemy.engine.Engine
SELECT testresult.test_case_fk AS testresult_test_case_fk, testresult.id AS testresult_id, testresult.result AS testresult_result, testresult.timestamp AS testresult_timestamp, testresult.error AS testresult_error
FROM testresult
WHERE testresult.test_case_fk IN (%(primary_keys_1)s, %(primary_keys_2)s, %(primary_keys_3)s, %(primary_keys_4)s, %(primary_keys_5)s, %(primary_keys_6)s, %(primary_keys_7)s, %(primary_keys_8)s, %(primary_keys_9)s, %(primary_keys_10)s, %(primary_keys_11)s, %(primary_keys_12)s, %(primary_keys_13)s, %(primary_keys_14)s, %(primary_keys_15)s, %(primary_keys_16)s, %(primary_keys_17)s, %(primary_keys_18)s, %(primary_keys_19)s, %(primary_keys_20)s, %(primary_keys_21)s, %(primary_keys_22)s)
2023-08-29 13:27:27,000 INFO sqlalchemy.engine.Engine [generated in 0.00017s] {'primary_keys_1': 1, 'primary_keys_2': 2, 'primary_keys_3': 3, 'primary_keys_4': 4, 'primary_keys_5': 5, 'primary_keys_6': 6, 'primary_keys_7': 7, 'primary_keys_8': 8, 'primary_keys_9': 9, 'primary_keys_10': 10, 'primary_keys_11': 11, 'primary_keys_12': 12, 'primary_keys_13': 13, 'primary_keys_14': 14, 'primary_keys_15': 15, 'primary_keys_16': 16, 'primary_keys_17': 17, 'primary_keys_18': 18, 'primary_keys_19': 19, 'primary_keys_20': 20, 'primary_keys_21': 21, 'primary_keys_22': 22}
Any ideas/suggestions/help for how i can limit the returned data and not get all these extra records? I had thought of maybe doing the query with a subquery to get TestResult
s within the date range then joining to that result set, but haven't figured out the syntax for that yet.
As i somewhat suspected the lazy loading was the culprit here for me and once the Endpoint
records were selected, it was just getting all child records that matched that key, even if i had specified the filter/where clause.
After a lot more reading of the docs, i found the necessary things to get the lazy loading to adhere to the conditions i wanted.
Docs: https://docs.sqlalchemy.org/en/20/orm/queryguide/api.html#sqlalchemy.orm.with_loader_criteria
The end result to query and get all the data via the relationships
query = db.select(Endpoint) \
.options(
with_loader_criteria(TestResult, TestResult.timestamp.between(the_day_before, start_date))
).order_by(Endpoint.id)