Search code examples
javamarklogicmarklogic-9

MarkLogic Wilcard Search - QConsole vs. Java API


I believe I’m seeing different results from a Java-based query and what I believe is the equivalent cts:search in the query console. There's a lot of information here and I tried to organize it appropriately. Here are the steps to set up a simple example that replicates what I’m seeing.

  1. Create new database with default settings
  2. Add new forest with default settings
  3. Enable three character searches (only non-default database setting)
  4. Insert the three json documents below into the database

Query console returns doc2. Java client returns doc2 AND doc1. Why? I would expect the same results from each. I want to get the results in Java that the query console is returning. Am I writing the query definition in Java incorrectly?

It looks like the Java client wildcard search is searching the entire document even though I’ve specified that I only want to do a wildcard search inside of the given json-property (name.)

Is there a way to see or log the resultant server-side “cts query” given a client-side RawCombinedQueryDefinition? I'd like to see what the Java request gets translated into on the server side.

doc1.json

{
  "state": "OH",
  "city": "Dayton",
  "notes": "not Cincinnati"
}

doc2.json

{
  "state": "OH",
  "city": "Cincinnati",
  "notes": "real city"
}

doc3.json

{
  "state": "OH",
  "city": "Daytona",
  "notes": "this is a made up city"
}

Query console code used to insert documents

xquery version "1.0-ml"; 
xdmp:document-load("/some/path/doc1.json",
  <options xmlns="xdmp:document-load">
    <uri>/doc1.json</uri>
  </options>
); 

Query console code used to search

xquery version "1.0-ml";
cts:search(fn:collection(),
  cts:and-query((
    cts:json-property-value-query("state", "OH"),
    cts:json-property-value-query("city", "*Cincinnati*") 
  ))
)

Java QueryManager query in easy to read text

{
  "search": {
    "query": {
      "queries": [
        {
          "value-query": {
            "type": "string",
            "json-property": "state",
            "text": "OH"
          }
        },
        {
          "value-query": {
            "type": "string",
            "json-property": "city",
            "text": "*Cincinnati*"
          }
        }
      ]
    }
  }
}

Java code

import com.marklogic.client.DatabaseClient;
import com.marklogic.client.DatabaseClientFactory;
import com.marklogic.client.document.DocumentPage;
import com.marklogic.client.document.DocumentRecord;
import com.marklogic.client.document.JSONDocumentManager;
import com.marklogic.client.io.Format;
import com.marklogic.client.io.StringHandle;
import com.marklogic.client.query.QueryManager;
import com.marklogic.client.query.RawCombinedQueryDefinition;
import org.junit.Test;

public class MarkLogicTest
{
    @Test
    public void testWildcardSearch()
    {
        DatabaseClientFactory.SecurityContext securityContext = new DatabaseClientFactory.DigestAuthContext("admin", "admin");
        DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8000, "test", securityContext);
        QueryManager queryManager = client.newQueryManager();
        JSONDocumentManager documentManager = client.newJSONDocumentManager();

        String query = "{\n" +
                "  \"search\": {\n" +
                "    \"query\": {\n" +
                "      \"queries\": [\n" +
                "        {\n" +
                "          \"value-query\": {\n" +
                "            \"type\": \"string\",\n" +
                "            \"json-property\": \"state\",\n" +
                "            \"text\": \"OH\"\n" +
                "          }\n" +
                "        },\n" +
                "        {\n" +
                "          \"value-query\": {\n" +
                "            \"type\": \"string\",\n" +
                "            \"json-property\": \"city\",\n" +
                "            \"text\": \"*Cincinnati*\"\n" +
                "          }\n" +
                "        }\n" +
                "      ]\n" +
                "    }\n" +
                "  }\n" +
                "}";

        StringHandle queryHandle = new StringHandle(query).withFormat(Format.JSON);
        RawCombinedQueryDefinition queryDef = queryManager.newRawCombinedQueryDefinition(queryHandle);
        DocumentPage documents = documentManager.search(queryDef, 1);

        while (documents.hasNext())
        {
            DocumentRecord document = documents.next();
            StringHandle resultHandle = document.getContent(new StringHandle());
            String result = resultHandle.get();
            System.out.println(result);
        }
    }
}

System.out.println() results

{"state":"OH", "city":"Dayton", "notes":"not Cincinnati"} 
{"state":"OH", "city":"Cincinnati", "notes":"real city"}

Why does the Java client return the first result where city = Dayton?

Thanks in advance!


Solution

  • The REST API and thus the Java API executes an unfiltered search by default (meaning, the matches are based entirely on the indexes). By contrast, cts:search() executes a filtered search by default (meaning, the result documents are inspected to throw out false positives).

    If you add the "unfiltered" option to cts:search(), it also returns both documents.

    The quick fix is to add the "filtered" option to the Java API search, but the better fix for performance at scale is to refine the indexes to support exact matching for the required wildcard queries.

    Elements are correlated with wildcards based on position.

    Thus, for this query, I believe you need to turn on the index configurations for element word positions and for three character word positions.

    Hoping that helps,