Search code examples
androidjava-native-interfaceopengl-es-2.0

Android OpenGL ES: auto-correcting env->self and NvRmChannelSubmit failed


Two questions below.

We have a graphics OpenGL ES 2 application that worked well for a few years on Windows, Linux, MacOS, iPhones, iPads, and Android phones. In the last few months we started receiving feedback from users of some of the Android devices (like Toshiba Thrive, HTC One X, Nexus 7 or Asus Transformer, API 15 and 17) regarding issues with black or flickering screen, or rarely, an app crash. Our app targets API 9 and up, and it is written in NDK using NativeActivity, based directly on nvidia android examples and demos, it has been thoroughly tested on all platforms, no memory leaks, no invalid memory accesses, it rarely calls some small java code.

Looking at LogCat, we noticed two kinds of error messages on these devices:

(1) JNI ERROR: env->self != thread-self (0x11734c0 vs. 0xd6d360); auto-correcting

(2) NvRmChannelSubmit failed (err = 196623, SyncPointValue = 0) followed by GL_OUT_OF_MEMORY

Regarding (1), we know about the threads vs. JNI issues, and we hopefully know how to fix this. I have read this information and my question here is: does "auto-correcting" mean that we have to worry about some ERROR, or is it just a warning meaning that the code will behave badly IN THE FUTURE, but now it works perfectly well (corrected!) and this is not related to issue (2)? The reason I'm asking is that sometimes we also see the following lines:

E/libEGL: call to OpenGL ES API with no current context (logged once per thread)
E/NvEGLUtil: Failure: eglSwapBuffers, error = 0x0000300d (swap:422)

which look seriously. We have tested our app on an API 17 emulator with JNIcheck enabled - no issues are reported, and the app works well.

Now, regarding message (2), I have found a few forums (for example here, here and also this) where people reported this message, and the reasons are unclear. Looks like firmware or driver issue, or GPU memory leaks or memory fragmentation... Many games are affected by screen flicker, and people are trying to reboot/reset the device, clear cache, upgrade, etc., but the issue seems to persist. This problem concerns quite a few popular devices. Despite GL_OUT_OF_MEMORY error code, "not enough memory" is not justified, because the app we used for tests used small 32x32 textures instead of 512x512 textures that are used in the regular version (and these bigger textures work perfectly well on older devices). Anyone has any experience on how to fix this, and is this fixable on our side at all? Is this an officially confirmed hardware/firmware/OS bug? I am looking for a known reason and a real solution to this problem, not a trial-and-error workaround that would accidentally help without knowing why.

Thanks!


Solution

  • So, after a few years of trying to identify the problem, it is time for the answer :-) The issue was extremely painful, time-consuming and difficult (almost impossible) to debug, it was non-deterministic, rare, would only affect some specific devices, it appeared that it was correlated with a specific version of the system or even with running (or not) other programs at the same time...

    In our C++ code, at the end of the nvidia framework's bool Engine::initUI() function we called our own keepScreenOn(getApp()) function, which, using the argument of the current activity, called our own static java method:

    //Keep the screen on.
    //Note that flag modification must be done in the UI thread:
    //https://android-developers.googleblog.com/2009/05/painless-threading.html
    static void keepScreenOn(Activity a) {
        final Window w = a.getWindow();
        if (w != null) {
            a.runOnUiThread(new Runnable() {
                public void run() {
                    w.addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
                }
            });
        }
    }
    

    As I understand, modifying the Window flag causes the window to be destroyed and recreated (anyone please correct me if I'm wrong), which is obviously not a good idea when the app is in the process of starting. It seems that this is what caused – albeit extremely rarely – some race condition between threads or problems to some graphics drivers... which resulted in delayed error messages like "NvRmChannelSubmit failed (err = 196623, SyncPointValue = 0)" and then "GL_OUT_OF_MEMORY".

    The fact that setting the window flag causes such delayed GL problems was surprising and it was not discovered by deduction (we spent a few years trying to find the cause of this problem in our OpenGL code). It was rather discovered by hopeless commenting out any piece of code that could influence the display... And the solution was to introduce our own subclass of NativeActivity which creates the main application window with the proper flag right from the start:

    public class OurSubclassOfNativeActivity extends NativeActivity
    {
        @Override
        protected void onCreate(Bundle savedInstanceState)
        {
            getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);   
            super.onCreate(savedInstanceState);
        }
    }
    

    We wanted to avoid introducing our own subclass of NativeActivity, but seems like the need to set the FLAG_KEEP_SCREEN_ON forces us to do so.