Search code examples
xquerymarklogic-8

Seach in Text Document on Marklogic server and want result based on searching Patten


I have uploaded a text document in Marklogic server with the name of collections("calling-returning"). Below is Text document:

    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,884 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -7703835814759006134 - Returning from WorkflowContentDao.deleteCompletedOrFailedContentList(..) Execution time: 16 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,900 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2561765194895194936 - Calling WorkflowContentDao.getWaitingForContentListToProcess(..) with parameters FTP
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,900 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2561765194895194936 - Returning from WorkflowContentDao.getWaitingForContentListToProcess(..) Execution time: 0 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2041334620910360341 - Calling WorkflowContentDao.getFTPWaitProcessType(..) with parameters ftp://10.103.100.43:21/VARIANTGENERATION/INPUT/30357186.pdf
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2041334620910360341 - Returning from WorkflowContentDao.getFTPWaitProcessType(..) Execution time: 0 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.consumer.WorkflowContentConsumer processWorkflowContent - processWorkflowContent workflow content task: DPC-CENELEC-PUBLISH 01-7915592210 VARIANT_GENERATION
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.schedule.task.ProcessWorkflowContent failWorkflowContentTask - Failing workflow content task using scheduler because its exceeded 30 min since created  DPC-CENELEC-PUBLISH 01-7915592210 VARIANT_GENERATION
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,931 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : 8235148762900748472 - Calling WorkflowContentDao.setPickedBy(..) with parameters com.innodata.bsi.domain.WorkflowContentInfo@5f7839bd
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,931 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : 8235148762900748472 - Returning from WorkflowContentDao.setPickedBy(..) Execution time: 0 ms

I am searching in this document '2561765194895194936 - Calling', Number could be anything. So I have written below query:

 let $search :=cts:search(collection("calling-returning"), cts:word-query(" - 
 Calling"))
 return $search

But it return full document. I want below type of result only:

  2561765194895194936 - Calling
  256176519489514568 - Calling
  568651948951566 - Calling

Solution

  • The unit of search and retrieval in MarkLogic is a document. If you want to search for the lines separately, they need to be separate documents. Once you have the matching document, if you want to pull matching lines from it, you'd need to tokenize the document into lines and run the match on each individual line, something like tokenize($doc,"\n")[cts:contains(text {.}, $query)]

    That isn't going to be very efficient, and you might be better off preprocessing the text document to add some markup (i.e. a root element and a line element around each line) then at least you don't have to do the tokenization of the whole thing, although you'd still have to walk the whole thing matching each line after the fact: $doc//line[cts:contains(., $query)]