Situation: I am porting our application to Jooq to eliminate several n+1 problems and ensure custom queries are type-safe (DB server is Postgresql 13). In my example we have documents (ID, file name, file size). Each document can have several unqique document attributes (Document ID as FK, archive attribute ID - the type of the attribute and the value). Example data:
Document:
acme=> select id, file_name, file_size from document;
id | file_name | file_size
--------------------------------------+-------------------------+-----------
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | My Really cool book.pdf | 13264
(1 row)
Document Attributes:
acme=> select * from document_attribute ;
document_id | archive_attribute_id | value
--------------------------------------+--------------------------------------+------------
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | b334e287-887f-4173-956d-c068edc881f8 | JustReleased
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 2f86a675-4cb2-4609-8e77-c2063ab155f1 | Tax
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 30bb9696-fc18-4c87-b6bd-5e01497ca431 | ShippingRequired
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 2eb04674-1dcb-4fbc-93c3-73491deb7de2 | Bestseller
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | a8e2f902-bf04-42e8-8ac9-94cdbf4b6778 | Paperback
(5 rows)
One can search via custom created JDBC prepared statement for these documents and their attribute. A user was able to create this query for a document ID and two document attributes with matching value, which returned the book 'My Really cool book.pdf':
SELECT d.id FROM document d WHERE d.id = '1ae56478-d27c-4b68-b6c0-a8bdf36dd341'
AND d.id IN(SELECT da.document_id AS id0 FROM document_attribute da WHERE da.archive_attribute_id = '2eb04674-1dcb-4fbc-93c3-73491deb7de2' AND da.value = 'Bestseller')
AND d.id IN(SELECT da.document_id AS id1 FROM document_attribute da WHERE da.archive_attribute_id = 'a8e2f902-bf04-42e8-8ac9-94cdbf4b6778' AND da.value = 'Paperback');
(After that the application fetches all document attributes for the returned document IDs - thus the n + 1 problem we want to solve)
Please note that all document values and document attributes are optional. One can only search for the file name or file size of a document but also several document attributes.
Question/Problems:
I wanted to port this code to Jooq and use a multiset, but I am struggeling how to apply the arbitrary subquery or join condition to the document attributes:
1.) How can I achieve this arbitrary adding of subqueries?
2.) Is a INNER JOIN more performant than a subquery?
Code:
import org.jooq.Condition;
import org.jooq.impl.DSL;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.jooq.impl.DSL.multiset;
import static org.jooq.impl.DSL.selectDistinct;
public class InSelectExample extends BaseTest {
private record CustomDocumentAttribute(
UUID documentId, // ID of the document the attribute belongs to
UUID archiveAttributeId, // There are predefined attribute types in our application. This ID references them
String value // Real value of this attribute for the document
) {
}
private record CustomDocument(
UUID documentId, // ID of the document
String fileName, // File name of the document
Integer fileSize, // File size in bytes of the document
List<CustomDocumentAttribute> attributes // Attributes the document has
) {
}
@Test
public void findPdfDocumentsWithParameters() {
// Should print the single book
List<CustomDocument> documents = searchDocuments(UUID.fromString("1ae56478-d27c-4b68-b6c0-a8bdf36dd341"), "My Really cool book.pdf", 13264, Map.of(
UUID.fromString("2eb04674-1dcb-4fbc-93c3-73491deb7de2"), "Bestseller",
UUID.fromString("a8e2f902-bf04-42e8-8ac9-94cdbf4b6778"), "Paperback"
));
System.out.println("Size: " + documents.size()); // Should return 1 document
// Should print no books because one of the document attribute value doesn't match (Booklet instead of Paperback)
documents = searchDocuments(UUID.fromString("1ae56478-d27c-4b68-b6c0-a8bdf36dd341"), "My Really cool book.pdf", 13264, Map.of(
UUID.fromString("2eb04674-1dcb-4fbc-93c3-73491deb7de2"), "Bestseller",
UUID.fromString("a8e2f902-bf04-42e8-8ac9-94cdbf4b6778"), "Booklet"
));
System.out.println("Size: " + documents.size()); // Should return 0 documents
}
private List<CustomDocument> searchDocuments(UUID documentId, String fileName, Integer fileSize, Map<UUID, String> attributes) {
// Get the transaction manager
TransactionManager transactionManager = getBean(TransactionManager.class);
// Get the initial condition
Condition condition = DSL.noCondition();
// Check for an optional document ID
if (documentId != null) {
condition = condition.and(DOCUMENT.ID.eq(documentId));
}
// Check for an optional file name
if (fileName != null) {
condition = condition.and(DOCUMENT.FILE_NAME.eq(fileName));
}
// Check for an optional file size
if (fileSize != null) {
condition = condition.and(DOCUMENT.FILE_SIZE.eq(fileSize));
}
// Create the query
var step1 = transactionManager.getDslContext().select(
DOCUMENT.ID,
DOCUMENT.FILE_NAME,
DOCUMENT.FILE_SIZE,
multiset(
selectDistinct(
DOCUMENT_ATTRIBUTE.DOCUMENT_ID,
DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID,
DOCUMENT_ATTRIBUTE.VALUE
).from(DOCUMENT_ATTRIBUTE).where(DOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID))
).convertFrom(record -> record.map(record1 -> new CustomDocumentAttribute(record1.value1(), record1.value2(), record1.value3())))
).from(DOCUMENT
).where(condition);
// TODO: What to do here?
var step3 = ...? What type?
for (Map.Entry<UUID, String> attributeEntry : attributes.entrySet()) {
// ???
// Reference: AND d.id IN(SELECT da.document_id AS id0 FROM document_attribute da WHERE da.archive_attribute_id = ? AND da.value = ?)
var step2 = step1.and(...??????)
}
// Finally fetch and return
return step1.fetch(record -> new CustomDocument(record.value1(), record.value2(), record.value3(), record.value4()));
}
}
1.) How can I achieve this arbitrary adding of subqueries?
You already found a solution to that question in your own answer, though I'll suggest an alternative that I personally prefer. Your approach creates N subqueries hitting your table N times.
2.) Is a INNER JOIN more performant than a subquery?
There's no general rule to this. It's all just relational algebra. If the optimiser can prove two expressions are the same thing, they can be transformed to each other. However, an INNER JOIN
is not the exact same thing as a semi join, i.e. an IN
predicate (although sometimes it is, in the presence of constraints). So the two operators aren't exactly equivalent, logically
Your own approach maps the Map<UUID, String>
to subqueries, hitting the DOCUMENT_ATTRIBUTE
N times. I'm guessing that the PG optimiser might not be able to see through this and factor out the common parts into a single subquery (though technically, it could). So, I'd rather create a single subquery of the form:
WHERE document.id IN (
SELECT a.document_id
FROM document_attribute AS a
WHERE (a.archive_attribute_id, a.value) IN (
(?, ?),
(?, ?), ...
)
)
Or, dynamically, with jOOQ:
DOCUMENT.ID.in(
select(DOCUMENT_ATTRIBUTE_DOCUMENT_ID)
.from(DOCUMENT_ATTRIBUTE)
.where(row(DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID, DOCUMENT_ATTRIBUTE.VALUE).in(
attributes.entrySet().stream().collect(Rows.toRowList(
Entry::getKey,
Entry::getValue
))
))
)
Using org.jooq.Rows::toRowList
collectors.
Note: I don't think you have to further correlate the
IN
predicate's subquery by specifying aDOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID)
predicate. That correlation is already implied by usingIN
itself.