Search code examples
marklogic

Query in Marklogic DB to filter documents for the last 24 hours


Is there a way to filter out documents in marklogic based on a date field?

Something like:

select * from data-hub-STAGING where load_date > '2022-01-12'


Solution

  • It is possible to issue such a query, as long as there is a date or dateTime being tracked and indexed.

    For any MarkLogic database, there is a database option to enable "maintain last modified", which will set and maintain a document property prop:last-modified that can be indexed and range-queries applied.

    Your example table was "data-hub-STAGING". If you are using a MarkLogic Data Hub, the datahubCreatedOn field is available with a dateTime range index.

    https://docs.marklogic.com/datahub/5.6/flows/about-flows.html#about-flows__pre_ay4_frh_ypb

    For every content object outputted by a Data Hub step, regardless of the step type, Data Hub will add the following document metadata keys and values to the document wrapped by the content object:

    • datahubCreatedOn = the date and time at which the document is written

    Of course, any other date/dateTime element, JSON field, or document property can also be indexed and then used to filter docs as well.

    Then the appropriate range query can be applied to a search in JavaScript, XQuery, Optic, SQL, etc.

    For example, in JavaScript searching by field datahubCreatedOn with cts.fieldRangeQuery():

    cts.search(cts.fieldRangeQuery("datahubCreatedOn", ">", 
      new Date(Date.now() - 86400 * 1000).toISOString()))
    

    and in XQuery searching by prop:last-modified with cts:element-range-query():

    cts:search(doc(), cts:element-range-query(xs:QName("prop:last-modified"), ">",
     current-dateTime() + xs:dayTimeDuration("-PT24H")))
    

    You should even be able to execute the SQL in your question, if you were to create a SQL Template View.