Search code examples
mongodbdatabase-designxsdmarklogic

Implementing MongoDB style document references in XML databases


I'm a newcomer to XML databases and in particular, I am trying to learn how Marklogic works. My apologies if these questions are too naive or obvious.

What I'd like to do is to implement MongoDB style document references in Marklogic since I think the pattern would apply very well on the latter, being itself a document oriented database.

This is what the MongoDB documentation has to say about manual and DBRef style document references:

http://docs.mongodb.org/manual/reference/database-references/

MongoDB recommends the use of manual document references.

Now, the most direct approach I can see is to define this piece of information as, say, a part of a schema definition; starting with the definition of an objectid, a book and a publisher:

<xs:simpleType name="objectId">
  <xs:restriction base="xs:string">
    <xs:length value="24"/>
    <xs:whiteSpace value="collapse"/>
  </xs:restriction>
</xs:simpleType>

<xs:element name="Publisher">
  <xs:complexType>
    <xs:attribute name="id" type="fbc:objectId" use="required"/>
    <xs:attribute name="name" type="xs:string" use="required"/>
    <xs:attribute name="location" type="xs:string" use="required"/>
  </xs:complexType>
</xs:element>

<xs:element name="Book">
  <xs:complexType>
    <xs:attribute name="Title" type="xs:string"/>
    <xs:attribute name="publisherId" type="fbc:objectId" use="required"/>
  </xs:complexType>
</xs:element>

So three questions:

  1. Would this suffice to model the document reference between a book and its publisher? Is there a better approach for Schema based XML documents?

  2. Would this approach introduce difficulties when doing XQueries inside Marklogic (or any other XML database such as existDB, Senda or Basex?

  3. Marklogic states that it can use "Modular documents" which hold some type of special document references using XPointer and XInclude:

    http://docs.marklogic.com/guide/app-dev/mod-docs

Are there any advantages in using that approach instead of manual document references? Are there any working Java API examples this feature?

I apologize in advance if these are too many questions but I believe they're all related to the overall question stated here. Thanks.

Update:

I think I will then resort to do some data de-normalization wherever appropriate and use plain old document URI attributes to reference other documents where needed. Not the best approach I guess but I think it may be good enough down the road. I'll keep updating with my findings. Thanks!


Solution

  • As David and WST have pointed out, MarkLogic emphasizes denormalization over joins. Storing data structure trees or structured textual content makes it possible to retrieve documents with high performance at scale.

    That said, MarkLogic does support joins. You can use XInclude to aggregate or just use an element or attribute whose value is the document URI for a related document. (The linking approach is comparable to linking in HTML.) Such links can be resolved by XQuery on the server or resolved on the client by retrieving the related documents with a single query.