Search code examples
javasqlpostgresqljooq

How to generate arbitrary subqueries/joins in a Jooq query


Situation: I am porting our application to Jooq to eliminate several n+1 problems and ensure custom queries are type-safe (DB server is Postgresql 13). In my example we have documents (ID, file name, file size). Each document can have several unqique document attributes (Document ID as FK, archive attribute ID - the type of the attribute and the value). Example data:

Document:

acme=> select id, file_name, file_size from document;
                  id                  |        file_name        | file_size 
--------------------------------------+-------------------------+-----------
 1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | My Really cool book.pdf |     13264
(1 row)

Document Attributes:

acme=> select * from document_attribute ;
             document_id              |         archive_attribute_id         |   value    
--------------------------------------+--------------------------------------+------------
 1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | b334e287-887f-4173-956d-c068edc881f8 | JustReleased
 1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 2f86a675-4cb2-4609-8e77-c2063ab155f1 | Tax
 1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 30bb9696-fc18-4c87-b6bd-5e01497ca431 | ShippingRequired
 1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 2eb04674-1dcb-4fbc-93c3-73491deb7de2 | Bestseller
 1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | a8e2f902-bf04-42e8-8ac9-94cdbf4b6778 | Paperback
(5 rows)

One can search via custom created JDBC prepared statement for these documents and their attribute. A user was able to create this query for a document ID and two document attributes with matching value, which returned the book 'My Really cool book.pdf':

SELECT d.id FROM document d WHERE d.id = '1ae56478-d27c-4b68-b6c0-a8bdf36dd341'
AND d.id IN(SELECT da.document_id AS id0 FROM document_attribute da WHERE da.archive_attribute_id = '2eb04674-1dcb-4fbc-93c3-73491deb7de2' AND da.value = 'Bestseller')
AND d.id IN(SELECT da.document_id AS id1 FROM document_attribute da WHERE da.archive_attribute_id = 'a8e2f902-bf04-42e8-8ac9-94cdbf4b6778' AND da.value = 'Paperback');

(After that the application fetches all document attributes for the returned document IDs - thus the n + 1 problem we want to solve)

Please note that all document values and document attributes are optional. One can only search for the file name or file size of a document but also several document attributes.

Question/Problems:

I wanted to port this code to Jooq and use a multiset, but I am struggeling how to apply the arbitrary subquery or join condition to the document attributes:

1.) How can I achieve this arbitrary adding of subqueries?

2.) Is a INNER JOIN more performant than a subquery?

Code:

import org.jooq.Condition;
import org.jooq.impl.DSL;
import org.junit.jupiter.api.Test;

import java.util.List;
import java.util.Map;
import java.util.UUID;

import static org.jooq.impl.DSL.multiset;
import static org.jooq.impl.DSL.selectDistinct;

public class InSelectExample extends BaseTest {

    private record CustomDocumentAttribute(
        UUID documentId, // ID of the document the attribute belongs to
        UUID archiveAttributeId, // There are predefined attribute types in our application. This ID  references them
        String value // Real value of this attribute for the document
    ) {
    }

    private record CustomDocument(
        UUID documentId, // ID of the document
        String fileName, // File name of the document
        Integer fileSize, // File size in bytes of the document
        List<CustomDocumentAttribute> attributes // Attributes the document has
    ) {
    }

    @Test
    public void findPdfDocumentsWithParameters() {
        // Should print the single book
        List<CustomDocument> documents = searchDocuments(UUID.fromString("1ae56478-d27c-4b68-b6c0-a8bdf36dd341"), "My Really cool book.pdf", 13264, Map.of(
            UUID.fromString("2eb04674-1dcb-4fbc-93c3-73491deb7de2"), "Bestseller",
            UUID.fromString("a8e2f902-bf04-42e8-8ac9-94cdbf4b6778"), "Paperback"
        ));
        System.out.println("Size: " + documents.size()); // Should return 1 document

        // Should print no books because one of the document attribute value doesn't match (Booklet instead of Paperback)
        documents = searchDocuments(UUID.fromString("1ae56478-d27c-4b68-b6c0-a8bdf36dd341"), "My Really cool book.pdf", 13264, Map.of(
            UUID.fromString("2eb04674-1dcb-4fbc-93c3-73491deb7de2"), "Bestseller",
            UUID.fromString("a8e2f902-bf04-42e8-8ac9-94cdbf4b6778"), "Booklet"
        ));
        System.out.println("Size: " + documents.size()); // Should return 0 documents
    }

    private List<CustomDocument> searchDocuments(UUID documentId, String fileName, Integer fileSize, Map<UUID, String> attributes) {
        // Get the transaction manager
        TransactionManager transactionManager = getBean(TransactionManager.class);

        // Get the initial condition
        Condition condition = DSL.noCondition();

        // Check for an optional document ID
        if (documentId != null) {
            condition = condition.and(DOCUMENT.ID.eq(documentId));
        }

        // Check for an optional file name
        if (fileName != null) {
            condition = condition.and(DOCUMENT.FILE_NAME.eq(fileName));
        }

        // Check for an optional file size
        if (fileSize != null) {
            condition = condition.and(DOCUMENT.FILE_SIZE.eq(fileSize));
        }

        // Create the query
        var step1 = transactionManager.getDslContext().select(
            DOCUMENT.ID,
            DOCUMENT.FILE_NAME,
            DOCUMENT.FILE_SIZE,
            multiset(
                selectDistinct(
                    DOCUMENT_ATTRIBUTE.DOCUMENT_ID,
                    DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID,
                    DOCUMENT_ATTRIBUTE.VALUE
                ).from(DOCUMENT_ATTRIBUTE).where(DOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID))
            ).convertFrom(record -> record.map(record1 -> new CustomDocumentAttribute(record1.value1(), record1.value2(), record1.value3())))
        ).from(DOCUMENT
        ).where(condition);

        // TODO: What to do here?
        var step3 = ...? What type?
        for (Map.Entry<UUID, String> attributeEntry : attributes.entrySet()) {
            // ???
            // Reference: AND d.id IN(SELECT da.document_id AS id0 FROM document_attribute da WHERE da.archive_attribute_id = ? AND da.value = ?)
            var step2 = step1.and(...??????)
        }

        // Finally fetch and return
        return step1.fetch(record -> new CustomDocument(record.value1(), record.value2(), record.value3(), record.value4()));
    }
}

Solution

  • Regarding your questions

    1.) How can I achieve this arbitrary adding of subqueries?

    You already found a solution to that question in your own answer, though I'll suggest an alternative that I personally prefer. Your approach creates N subqueries hitting your table N times.

    2.) Is a INNER JOIN more performant than a subquery?

    There's no general rule to this. It's all just relational algebra. If the optimiser can prove two expressions are the same thing, they can be transformed to each other. However, an INNER JOIN is not the exact same thing as a semi join, i.e. an IN predicate (although sometimes it is, in the presence of constraints). So the two operators aren't exactly equivalent, logically

    An alternative approach

    Your own approach maps the Map<UUID, String> to subqueries, hitting the DOCUMENT_ATTRIBUTE N times. I'm guessing that the PG optimiser might not be able to see through this and factor out the common parts into a single subquery (though technically, it could). So, I'd rather create a single subquery of the form:

    WHERE document.id IN (
      SELECT a.document_id
      FROM document_attribute AS a
      WHERE (a.archive_attribute_id, a.value) IN (
        (?, ?),
        (?, ?), ...
      )
    )
    

    Or, dynamically, with jOOQ:

    DOCUMENT.ID.in(
      select(DOCUMENT_ATTRIBUTE_DOCUMENT_ID)
      .from(DOCUMENT_ATTRIBUTE)
      .where(row(DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID, DOCUMENT_ATTRIBUTE.VALUE).in(
        attributes.entrySet().stream().collect(Rows.toRowList(
          Entry::getKey,
          Entry::getValue
        ))
      ))
    )
    

    Using org.jooq.Rows::toRowList collectors.

    Note: I don't think you have to further correlate the IN predicate's subquery by specifying a DOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID) predicate. That correlation is already implied by using IN itself.