Search code examples
javahadoophbaseapache-phoenix

Apache Phoenix java.lang.OutOfMemoryError: unable to create new native thread


I have a tiny Hadoop cluster with 5 data nodes and 1 name node, all 4 core/4 thread machines with 4GB of RAM each, except one data node that has 8GB of RAM.

They're all running RHEL 6 x86_64. HBase version is 1.2 and Phoenix version is 4.14

I am connecting to Apache Phoenix through the Phoenix Query Server and the "thin" JDBC client. Phoenix Query Server is running on the name node.

I am trying to upsert ~2000 tuples, ~25 columns each 10min, the table has over 2 million tuples inserted already, but sometimes I get exceptions in the form:

Caused by: java.lang.OutOfMemoryError: unable to create new native thread [...] Caused by: AvaticaClientRuntimeException: Remote driver error: RuntimeException: org.apache.phoenix.execute.CommitException: java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread -> CommitException: java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread -> RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread -> OutOfMemoryError: unable to create new native thread. Error -1 (00000) null

Phoenix Query Server is running on the name node, I'm not sure what is wrong.

It isn't an actual OutOfMemoryException but as if it were creating many threads and running out of them?

I've tried doing a ps aux and I can't see the Phoenix Query Server process creating more than ~50 threads, which afaik, is way, way less than the thread limit in a normal Linux install.

Maybe it really is running out of memory and failing to create native threads is a symptom?


Solution

  • Turns out the user the Hadoop process was running had too low limits for new processes, I edited

    /etc/security/limits.conf

    With:

    user - nproc 32768
    

    And it worked. I did not see a specific thread count limit, but increasing the process count limit did the trick.

    I've also read around that increasing the open file count limit is needed for clusters too:

     user - nofile 32768
    

    I set it up that way to avoid issues in the future.