Search code examples
hl7-fhirhapi

Do (document) bundle entries always have to be referenced or referencing?


The specification for FHIR documents seems to mandate that all bundle entries in the document resource be part of the reference graph rooted at the Composition entry. That is, they should be the source or the target of a reference relation that traces all the way up to the root entry.

Unfortunately I have not been able to locate all the relevant passages in the FHIR specification; one place where it is spelled out is in 3.3.1 Document Content, but it is not really clear whether this pertains to all bundles of type 'document' (i.e. even those that happen to be bundles with type code 'document' but are merely collections of machine-processable data without any aspirations to represent a FHIRy document).

The problem with the referencedness requirement lies in the fact that the HAPI validator employs linear search for checking the references. So, if we have to ship N bundle entries full of data to a payor, we have to include a list with N references (one for each data-bearing bundle entry). That leads to N reference searches with O(N) effort during validation, which makes the reference checking complexity effectively quadratic in the number of entries.

This easily brings even the most powerful computers to their knees. Current size contraints effectively cap the number of entries per file at roughly 25000, and the HAPI validator needs several hours to chew through that, even on the most powerful CPUs currently available. Without the references, validation would take less than a minute for the same file.

In our use case, data-bearing entries have no identity outside of the containing bundle file. Practically speaking they would need neither entry.fullUrl nor entry.resource.id, because their business identifiers are contained in included base64 blobs. However, presence or absence of these identifiers has no practical influence on the time needed for validation (fractions of a second even for a 1 GB file), so who cares. It's the list of references that kills the HAPI validator.

Perhaps it would be possible to fulfil the letter of the referencedness requirement by making all entries include a reference to the Composition. The HAPI validator doesn't care either way, so I don't know whether that would be valid or not. But even if it were FHIRly valid, it would be a monstrously silly workaround.

Is there a way to ditch the referencedness requirement? Perhaps by changing the bundle type to something like 'collection', or by using contained resources?

P.S.: for the moment we are using a workaround that cuts the time for validation from hours to less than a minute, but it's a hack, and we currently don't have the resources to fix the HAPI validator. What I'm mostly concerned about is the question how the specifications (profiles) need to be changed in order to avoid the problem I described.


Solution

  • (i.e. even those that happen to be bundles with type code 'document' but are merely collections of machine-processable data without any aspirations to represent a FHIRy document)

    If it is not a document, and not intended to be one, do not use the 'document' Bundle type. If you do, you would me misrepresenting the data which is what FHIR tries to avoid. It seems like you want to send a collection of resources that are not necessarily related, so

    Is there a way to ditch the referencedness requirement? Perhaps by changing the bundle type to something like 'collection'

    Yes, I would use 'collection', or maybe a 'batch/transaction' depending on what I want to tell the receiver to do with the data.