Search code examples

How to set the parameter dictRemoveWS to true in Apache Ruta

dictRemoveWS is related to Wordlist. I want to remove whitespace issue in my custom ruta script taking wordlist from txt file.

The documentation says to use CONFIGURE method with following example:

ENGINE utils.HtmlAnnotator; Document{->CONFIGURE(HtmlAnnotator, "onlyContent" = false)}; But still I'm unable to set dictRemoveWS parameter to true


  • Parameters like this can be configured on the engine configuration:

    In Java:

    AnalysisEngineFactory.createEngine(RutaEngine.class, typeSystemDescription,
                    RutaEngine.PARAM_SCRIPT_PATHS, scriptsPath,
                    RutaEngine.PARAM_RESOURCE_PATHS, resourcesPath,
                    RutaEngine.PARAM_MAIN_SCRIPT, "Main",
                    RutaEngine.PARAM_DICT_REMOVE_WS, true)

    or in XML definition:

    See Ruta documentation for more information: