Search code examples
marklogic

Multiple URIs for a same document in MarkLogic?


While loading a file repository into MarkLogic, I realized that it contains a significant number of duplicate documents. For example, multiple copies of a same product catalog document ("catalog.pdf") may be found in several different directories such as: /products/published-documents/, /sourcing/references/, /marketing/materials/.

I am wondering if I can remove those duplicates by having multiple URIs in MarkLogic pointing to the same document (like symlink?). Or perhaps there are other approaches to achieve the same effect. I have considered using collections but we do have a requirement to preserve the directory structure so users can continue accessing the files via WebDAV.


Solution

  • No, it is a fairly low-level constraint in MarkLogic that documents have one and only one URI. However, you could use modular document features like XInclude or XPointer and replace the duplicated documents with a reference to the canonical URI.

    https://docs.marklogic.com/guide/app-dev/mod-docs

    Collections may also be helpful in that scenario, for example, to assign the canonical document to one collection and the duplicates to another. Then it would be simpler to query only the canonical documents.

    But if WebDAV is the primary interface, then neither may be appropriate, since a WebDAV client would simply open the shallow document with the XInclude URI reference.

    One possible exception is through permissions. It is also possible to assign different permissions to the canonical and duplicate documents, such that WebDAV users don't have access to duplicates. Then duplicates would simply not be listed for anyone browsing WebDAV. That behavior doesn't perfectly imitate symlink, but it may be close enough.