Search code examples
javatime-complexityjooq

Accessing columns from a composite record in constant time


I am retrieving a record from a query with multiple join clauses, making it composite. The record has about 20 columns, so I am looking for an efficient way to access the data. I will only look at the data and not modify anything or commit back to the database.

I looked at two short, readable options that jOOQ provides:

  • Using get(Field<?>) from the Record instance of fetchOne(),
  • Creating instances of jOOQ generated records via fromMap(Map<String, ?>) with a map from fetchOneMap(), and using the generated getters.

To try and find out how efficient these options are, I looked at how these methods are implemented. If I looked correctly, get is implemented linearly (via field0(Field<?>, FieldOrIndex<U>)), which would make accessing the n fields get a time complexity of O(n^2).

I figured that perhaps, fromMap(Map<String, ?>) would be efficient to initialize schema-generated records instead, so the generated getters can be used. However, fromMap eventually calls from(Record) which loops over all fields while (inside the loop) indirectly calling field0 (that was linear), making fromMap have a time complexity of O(n^2) as well. Of course, retrieving the n columns would now have a complexity of O(n), but the earlier initialization has caused a bottleneck.

In principle, my question reads:

Is there a way to access column data of a composite record by simple methods (single parameter like get(Field<?>) or the generated getters), in a somewhat constant manner (thus in total linearly)? Or am I left with using the Map<String, Object> from fetchOneMap, sacrificing some readability due to manually casting?

Thanks in advance!


Solution

  • The O(N) read time complexity of the common Record.get(Field<T>) method is unfortunate, of course. Much faster access can be achieved when accessing data by index, e.g. via:

    • Record.get(int)
      • Con: No type safety
      • Con: access by index is error prone
    • Record20.value3(), etc.
      • Pro: type safe up to 22 columns
      • Con: access by index is error prone
    • Records.mapping(Function20), etc.
    • Use the DefaultRecordMapper to map data into a DTO / POJO using reflection.
      • Pro: the reflection lookups are cached in your Configuration instance, if you're sharing that among sessions, so the bulk overhead happens only once per query type
      • Con: no type safety in the mapping between query and DTO / POJO
    • Use the generated TableRecord types
      • Pro: they internally access columns by index, but map the index to a type safe name
      • Con: you'll be inclined to always fetch all columns of a table, which is much worse for performance. Besides, this approach doesn't work with joins
    • There are more alternatives.

    A hypothetical ideal solution would be:

    • A compiler plugin that introspects every one of your queries and produces a DTO / POJO just for that single query, e.g. as a Java 16 record type, and then uses index-based access to map between jOOQ records and Java records. This would work like an F# type provider. If that's ever possible in Java / jOOQ, this will be the default option.

    So, as you can see, it's a tradeoff. You don't have to forego type safety when accessing values by index (up to degree 22), but just as with JDBC, index based access is error prone when you add columns to the beginning of a row, when all other columns shift by 1.

    While a jOOQ Result is not an optimal data structure for processing data in-memory, all benchmarks in the past have shown that the overhead produced by the actual SQL query is orders of magnitudes more significant than the overhead caused by this record access, even when running trivial queries against an in-memory H2 database. Alternative internal representations using a Map<Field<?>, ?> didn't perform much better because of the internal memory overhead of a Map compared to an Object[], as well as the increased CPU overhead - at least for small N. It would probably fare better for e.g. 200 columns, but still, a 200 column query will produce orders of magnitudes more overhead on the server side than your mapping.

    TL;DR:

    If you can measure a significant overhead, e.g. using a profiler, then the fastest access is always by index, e.g. using Record.get(int) or Record.get(int, Class<T>). Chances are, you won't measure anything significant, though.