Search code examples
javanlpheap-memoryuima

MetaMap java.lang.OutOfMemoryError: Java heap space


We keep encountering a java.lang.OutOfMemoryError: Java heap space error when running MetaMap (with Java API and UIMA wrapper).

Unfortunately, the logs are not very informative, so we don't know which file it's puking on.

In the past, we've had issues with MetaMap creating huge circular annotations when it's encountered the pipe (|) symbol. However, the file set we're using (MIMIC notes) don't contain any pipe symbols. Are there other characters that may be exhibiting similar behavior to the pipe symbol?

We could increase system RAM to circumvent the heap space issue (it's actually not able to use the maximum set heap, which is set to 6 GB, since system RAM is limited), but we would prefer to know what is causing the issue, especially since then the output file size is more manageable.

* EDIT *

Just to clarify: We have increased memory resources for the JVM and that does help to actually push the data through (this was tested on a local VM). The problem MetaMap has is that it creates enormous circular annotations that eat up the JVM resources (and on our current system, the OS RAM is not optimal).

As noted in my comment below, we preprocess the files to strip them of any characters that throw errors. The heap space error is kind of annoying though, since unlike for other errors we've encounter (e.g., spaces surrounding a lone period, as in text . text), these just throw a parsing error with the text that threw the error. In the case of the pipe symbol, we found it by increasing RAM (on the VM we were initially testing this on) and then looking at the annotations in the UIMA viewer. We were able to identify the problematic files, since the output file size of the XMI with circular annotations is enormous.

We are running some tests on the VM again to see if we can identify the issue, but if anyone has MetaMap experience to help us identify any problem characters or character sequences, that would be desirable.

* EDIT 2 *

Memory should not be an issue. We are running the app using export JAVA_TOOL_OPTIONS='-Xms2G -Xmx6G -XX:MinHeapFreeRatio=25 -XX:+UseG1GC'

there is a fundamental issue with circular annotations we are trying to resolve. This is gobbling up resources and puking.


Solution

  • The solution was two fold:

    There is a UIMA JVM environment variable that needed to be set, as export UIMA_JVM_OPTS="-Xms128M -Xmx5g"

    And secondly, there is a MetaMap switch that reduces the recursion depth for creating annotations (which goes in the MetaMapApiAE.xml config file):

    <configurationParameterSettings>
        ... previous settings omitted ...
       <nameValuePair>
         <name>metamap_options</name>
         <value>
           <string>--prune 30</string>
         </value>
       </nameValuePair>
    </configurationParameterSettings>