Search code examples
apacheruta

How to set the parameter dictRemoveWS to true in Apache Ruta


dictRemoveWS is related to Wordlist. I want to remove whitespace issue in my custom ruta script taking wordlist from txt file.

The documentation says to use CONFIGURE method with following example:

ENGINE utils.HtmlAnnotator; Document{->CONFIGURE(HtmlAnnotator, "onlyContent" = false)}; But still I'm unable to set dictRemoveWS parameter to true


Solution

  • Parameters like this can be configured on the engine configuration:

    In Java:

    AnalysisEngineFactory.createEngine(RutaEngine.class, typeSystemDescription,
                    RutaEngine.PARAM_SCRIPT_PATHS, scriptsPath,
                    RutaEngine.PARAM_RESOURCE_PATHS, resourcesPath,
                    RutaEngine.PARAM_MAIN_SCRIPT, "Main",
                    RutaEngine.PARAM_DICT_REMOVE_WS, true)
    

    or in XML definition:

    https://github.com/apache/uima-ruta/blob/trunk/example-projects/ExampleProject/descriptor/BasicEngine.xml

    See Ruta documentation for more information: https://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.ae.basic