Search code examples
javacdllinteropjna

Pointer issues when mapping native C functions to Java interface with JNA


This is going to be a long post in order to properly explain the issue, so please bear with me. It could also require some knowledge of the internals of the JNA library (v 4.1.0), or ability to examine its source code.

In a few words, we have issues when obtaining pointers to native functions from a 3rd-party component that is written in C. The problematic pointers seem to break the JNA functionality, because of repeating pointer values. The issue is observed repetitively when we execute the JNA bindings as part of the child JVM process within another JVM process.

Background

We are integrating with a 3rd party tool for Windows written in C. The tool manufacturer has provided us with the C header files and a dll that we must inter-operate with trough our Java code. The dll contains structures that expose function pointers, which we are mapping to Java interfaces via JNAerator, I will refer to it as the interop.dll.

The interop.dll communicates with the 3rd party tool (that is pre-installed on the system), thus it is kind of a communication sdk. For testing purposes, we have recently been provided with a stub.dll (again from that manufacturer), which does not require the 3rd party tool to be running, or installed at all. The interop.dll is responsible to decide whether to use the stub or the real 3rd party tool, and automatically chooses the stub if it is present in the bin directory.

So, in any case, we have to map a fixed number of functions exposed by the interop.dll.
To assist in that, the interop.dll would contain the following function:

void* (__cdecl *ObtainInterface)( const char* interfaceName );

and we would map it in Java like this:

public interface ObtainInterface_callback extends Callback {
    Pointer apply(String interfaceName);
};
public ObtainInterface_callback ObtainInterface;

This function is used to "extract" another function from the either the 3rd party tool or the stub.dll and then export it to a Java interface by using its pointer value. In other words, we use it to dig trough the target dll's API and map other C functions that we need to Java interfaces. The functions we are extracting are declared within respective C structures and would be declared in the following manner

void (__cdecl *SomeName)(Params.....)

to latter be automatically mapped by JNAerator in a fashion similar to the above ObtainInterface.

So, here is how we obtain the interfaces in our Java code:

Pointer interface1Pointer = ObtainInterface_callback.apply("Interface1");
Interface1 interface1 = new Interface1(interface1Pointer);

Pointer interface2Pointer = ObtainInterface_callback.apply("Interface2");
Interface2 interface2 = new Interface2(interface2Pointer);

Pointer interface3Pointer = ObtainInterface_callback.apply("Interface3");
Interface3 interface3 = new Interface3(interface3Pointer);

where the constructor of Interface1 would look like this (same for Interface2 and Interface3):

public Interface1(Pointer peer) {
    super(peer);
    read();
}

Note: (in response for technomage's answer) The above code for Interface1, 2 and 3 was automatically generated by JNAerator, in an attempt to map the C struct with functions to a Java object with callbacks.

We have managed to successfully integrate with the interop.dll and the 3rd party tool.


The Problem

When we switch to using the stub dll, we are getting some IllegalStateException coming from the JNA code (CallbackReference.java @ line 122). The problem occurs when we attempt to obtain the third interface Interface3 interface3 = new Interface3(interface3Pointer);

We downloaded the JNA's sources and started debugging trough the code to see what exactly is causing the issue.

The read() method (see constructor of Interface1 above) internally calls a readField() method for all members of the mapped structure. Because all structure members are function pointers, readField produces a Callback instance (as in Pointer.java @line 419), and latter result in a call to the native method long _getPointer(long addr). For those interested, the native method looks like this (I am not really sure this is relevant enough):

dispatch.c, @line 2359

/*
 * Class:     Native
 * Method:    _getPointer
 * Signature: (J)Lcom/sun/jna/Pointer;
 */
JNIEXPORT jlong JNICALL Java_com_sun_jna_Native__1getPointer
    (JNIEnv *env, jclass UNUSED(cls), jlong addr)
{
    void *ptr = NULL;
    MEMCPY(env, &ptr, L2A(addr), sizeof(ptr));
    return A2L(ptr);
}

What we identified there was an issue with the address returned by the above _getPointer call, while running with the stub.dll. Here are the details we captured when debugging:

  • interface2Pointer has value 402394304 (0x17FC0CC0), (the pointer of the C struct)
  • The readField method discovers 10 function pointers within that struct, the last residing at offset 36
    • function10 -> interface2Pointer + offset = 402394304 + 36 = 402394340 (0x17FC0CE4).
    • Finally, there is a call to _getPointer(interface2Pointer.function10) = _getPointer(402394340) which would return the address of the callback within the struct, currently 401814304 (0x17F33320).

The same is repeated for interface3Pointer

  • interface3Pointer -> 402397356 (0x17FC18AC)
  • there are two inner functions with offsets, respectively 0 and 4, which are retrieved by readField method:
    • function1 -> 402397356 + 0 = 402397356 (0x17FC18AC)
      • _getPointer(interface3Pointer.function1) = _getPointer(402397356) then returns 402087408 (0x17F75DF0)
    • function2 -> 402397356 + 4 = 402397360 (0x17FC18B0)
      • _getPointer(interface3Pointer.function2) = _getPointer(402397360) then returns 401814304 (0x17F33320) (!)

As you can see, the interface3Pointer.function2 is being assigned the same pointer as interface2Pointer.function10.

Now, the CallbackReference.java internally uses a weak hash map to keep track of callback pointers who have already been assigned to a Java representation, The IllegalStateException is being thrown because that map still has a reference to the already matched pointer (interface2Pointer.function10 @ 401814304), thus it is unable to insert it again and map it to another interface.

I can observe three problems from this point:

  1. Is it normal for different functions to result in the same pointer? Maybe the stub.dll uses the same callback for both operations? This is rather surprising, as interface2Pointer.function10 has different signature than interface3Pointer.function2.
  2. The weak hash map usage brings a great amount of uncertainty in the above code. If we halt the debugger long enough for a GC call to occur, we can bypass the exception, thus the behavior may not always be reproducible.
  3. I am unable to determine whether if the GC indeed occurs, we will get the desired behavior. What if that same pointer is wrong in first place? In case of successful assignments I fear we might end up invoking the wrong callback.

The above observations are consistent with subsequent retrials after restarting both the process and the host OS. We are even getting the same address pointers as the ones mentioned here on subsequent executions.

To make things worse, the 3rd party tool manufacturer claims there are no issues with both the interop.dll and the stub.dll that could cause the above behavior.

Update In response to comments, I am adding the signatures of the native functions here:

interface2.function10:

void (__cdecl *function10)( CallbackWithFunction10EventInfo cb, void* userData );

interface3.function1:

void (__cdecl *function1)(CallbackWithNoData cb, void* userData, int value );

interface3.function2:

void (__cdecl *function2)(CallbackWithNoData cb, void* userData);

Signature Note

While the two methods obviously have different types for their first parameter cb, it is not impossible the CallbackWithFunction10EventInfo to be "hierarchically" related to CallbackWithNoData (like some sort of faked inheritance, which is possible in certain circumstances in C). Could something like this impact the returned pointer values?


Some Assertions

We also debugged the pointer values that are returned in case we remove the stub dll and use the working integration, with the interop.dll and the real tool. Our java code is still the same.

  • interface2Pointer -> 401508620 (0x17EE890C)

  • function10 -> interface2Pointer + offset = 401508620 + 36 = 401508656 (0x17EE8930).

  • _getPointer(interface2Pointer.function10) = _getPointer(401508656) = 400857536 (0x17E499C0).

  • interface3Pointer -> 401508920 (0x17EE8A38)

  • function1 -> interface3Pointer + offset1 = 401508920 + 0 = 401508920 (0x17EE8A38).

  • _getPointer(interface3Pointer.function1) = _getPointer(401508920) = 401018032 (0x17E70CB0).

  • function2 -> interface3Pointer + offset2 = 401508920 + 4 = 401508924 (0x17EE8A3C).

  • _getPointer(interface3Pointer.function2) = _getPointer(401508924) = 401017424 (0x17E70A50)

Obviously, the non-stub addresses are unique, and we get the inter-operation working.


Our Setup

The code is being executed on a virtual machine with Microsoft Windows XP, and resides in a shaded jar. We use JDK/JRE 1.6 and JNA version 4.1.0.

Our test and execution scenarios provide 3 means of executing the Java process that does the interop binding:

  1. Standalone process - works well with the real tool, silently fails with the stub.dll
  2. Child process of another JVM process - works well with the real tool, throws the discussed IllegalStateException with the stub.dll.
  3. Child process of another JVM proces, but we comment out the interface2 and interface3 bindings. The thing is working correctly

The command line we use to start the child Java process in steps 2 and 3 is:

java -cp our-shaded.jar main.class.package.Application

and when debugging, we add -Xdebug -Xrunjdwp:transport=dt_socket,address=8998,server=y

Update

While just performing some additional assertions, it was worth to examine the pointers returned by the stub.dll in case of a standalone process execution (as in point 1 above). The result was both confusing and gave us some direction. The Standalone process obtained unique pointers, in a similar way as if it was working with the real tool. Thus, the cause might be with the child process and some shared memory or limits to the memory exposed between the native code and the child Java process...


The Question

I would appreciate any clarity on whether the issue is caused by our usage or the stub dll itself (I would blame the latter). We may need to convince the third party manufacturer if there is indeed a problem with their code, otherwise we might not get a chance for a new version of the stub, meaning we should look for a workaround. So, any help in that direction, or workaround tips is welcome.


Solution

  • The intent of uniquely mapping function pointers to Callback references is to expose programmatic errors in callback mapping, as well as provide a method for automatically disposing of memory when native pointers go out of scope. Generally a C function pointer has a single acceptable signature (with the exception of varargs semantics and casting). The cleanup also becomes a bit more complex if a single native pointer maps to multiple Java objects.

    It's possible that your native code is dynamically allocating the function pointers, in which case a particular pointer might end up being reused (especially if the native code is using an explicit memory pool). If that's the case, you would probably just need to purge the weak hash map (JNA does not expose this, but it would be trivial to call .size() on the map in a bit of customized code).

    The native code may also be using placeholder functions, where a placeholder or common function is reused (usually where the method signature is the same). If this is the case the error would be deterministic (which doesn't appear to be your case).

    Alternatively, the native code may be using a single dispatch function (this doesn't sound like the case or you'd be seeing the error on every function pointer after the first).

    I'd like to note that it would probably be much easier for you if you actually mapped the native struct into JNA Structure. That would avoid for you the manual extraction and initialization of interface pointers. JNA is perfectly capable of initializing a host of function pointers (i.e. Callbacks) within a Structure.

    UPDATE

    Given that function10 and function2 have effectively the same signature, ((*)(), void*), your stub library may well be using a placeholder function (e.g. "_not_implemented"). If you're not actively using these functions, you can simply change them both to have the same interface (either an existing one or one you write). That would get around the JNA restriction.

    Arguably JNA could drop this restriction, or provide a way around it, but this requires code changes within JNA. Even if it's a matter of the native code re-using a function pointer in a later (in time) context, you'd need to tweak JNA to be able to purposely flush the older mapping (assuming it's truly no longer in use).