Search code examples
luatorch

lua_pcall error message is missing


I'm using lua_pcall to call some function and I want to catch the error. In some cases, the error seem to get lost. How can that happen? This is for both the case where I use an error handler and when I don't use not. In both cases, the top of the stack is not a string.

The C code:

  lua_getglobal(L, "debug");
  lua_getfield(L, -1, "traceback");
  lua_replace(L, -2);
  lua_rawgeti(L, LUA_REGISTRYINDEX, my_func_index);
  // now push n_in number of values on the stack
  luaT_stackdump(L);
  int pcall_ret = lua_pcall(L, n_in, n_out, -n_in - 2);
  // lua_pcall will consume n_in+1 values from the stack.
  if(pcall_ret != 0) {
    const char* errmsg = lua_tostring(L, -1);
    if(!errmsg) {
      errmsg = "(No Lua error message.)";
      printf("Unexpected Lua stack:\n");
      luaT_stackdump(L);
    }
    printf("Lua error code %i: %s\n", pcall_ret, errmsg);
    lua_pop(L, 2);  // remove error and debug.traceback from the stack
    return ...;
  }
  // now we got n_out values on the stack

The Lua function which is called looks like this (for testing):

    function (x, W, b, index)
        print "hi from Lua func"
        A = torch.rand(15, 12)
        B = torch.rand(12, 23)
        C = torch.dot(A, B)
    end

It somehow gets an error when it calls torch.dot. But I don't exactly know why. And I don't get any meaningful error. This is what my question is about.

The output:

  1. Lua object type: function
  2. Lua object type: function
  3. userdata 4165a368 [torch.FloatTensor]
  4. userdata 4165a390 [torch.FloatTensor]
  5. userdata 4165a230 [torch.FloatTensor]
  6. userdata 4165a258 [torch.CharTensor]
---------------------------------------------
hi from Lua func
Unexpected Lua stack:
  1. Lua object type: function
  2. userdata 40ea1230 [torch.DoubleTensor]
---------------------------------------------
Lua error code 2: (No Lua error message.)

Or maybe my code is correct and it really should return the error string here? So maybe there is some memory corruption when calling torch.dot, i.e. something gets messed up?


Solution

  • It seems I need to call torch.updateerrorhandlers(). Then I get some meaningful output:

    hi from Lua func
    Lua error code 2: inconsistent tensor size at /tmp/luarocks_torch-scm-1-1092/torch7/lib/TH/generic/THTensorMath.c:384
    stack traceback:
            [C]: at 0x7f63cd831360
            [C]: in function 'dot'
            [string "return ..."]:9: in function <[string "return ..."]:2>
    

    But only if I have torch.updateerrorhandlers() inside the Lua function.

    I tried with this C code and that doesn't work:

        lua_getglobal(L, "torch");
        lua_getfield(L, -1, "updateerrorhandlers");
        lua_replace(L, -2);
        assert(lua_pcall(L, 0, 0, 0) == 0);
    

    I figured out that if I do another torch.updateerrorhandlers() call right before my actual my_func_index lua_pcall, it works. This is unexpected but maybe this is because this might be another thread (which I would not have expected). Actually, I found in the Torch code the function torch.updatethreadlocals() which is exactly for this purpose and I'm calling this one now, right before my other lua_pcall:

        lua_getglobal(L, "torch");
        lua_getfield(L, -1, "updatethreadlocals");
        lua_replace(L, -2);
        assert(lua_pcall(L, 0, 0, 0) == 0);
    

    This works now.