This is going to be a long post in order to properly explain the issue, so please bear with me. It could also require some knowledge of the internals of the JNA library (v 4.1.0), or ability to examine its source code.
In a few words, we have issues when obtaining pointers to native functions from a 3rd-party component that is written in C. The problematic pointers seem to break the JNA functionality, because of repeating pointer values. The issue is observed repetitively when we execute the JNA bindings as part of the child JVM process within another JVM process.
We are integrating with a 3rd party tool for Windows written in C. The tool manufacturer has provided us with the C header files and a dll that we must inter-operate with trough our Java code. The dll contains structures that expose function pointers, which we are mapping to Java interfaces via JNAerator
, I will refer to it as the interop.dll
.
The interop.dll
communicates with the 3rd party tool (that is pre-installed on the system), thus it is kind of a communication sdk. For testing purposes, we have recently been provided with a stub.dll
(again from that manufacturer), which does not require the 3rd party tool to be running, or installed at all. The interop.dll
is responsible to decide whether to use the stub or the real 3rd party tool, and automatically chooses the stub if it is present in the bin directory.
So, in any case, we have to map a fixed number of functions exposed by the interop.dll
.
To assist in that, the interop.dll
would contain the following function:
void* (__cdecl *ObtainInterface)( const char* interfaceName );
and we would map it in Java like this:
public interface ObtainInterface_callback extends Callback {
Pointer apply(String interfaceName);
};
public ObtainInterface_callback ObtainInterface;
This function is used to "extract" another function from the either the 3rd party tool or the stub.dll
and then export it to a Java interface by using its pointer value. In other words, we use it to dig trough the target dll's API and map other C functions that we need to Java interfaces. The functions we are extracting are declared within respective C structures and would be declared in the following manner
void (__cdecl *SomeName)(Params.....)
to latter be automatically mapped by JNAerator
in a fashion similar to the above ObtainInterface
.
So, here is how we obtain the interfaces in our Java code:
Pointer interface1Pointer = ObtainInterface_callback.apply("Interface1");
Interface1 interface1 = new Interface1(interface1Pointer);
Pointer interface2Pointer = ObtainInterface_callback.apply("Interface2");
Interface2 interface2 = new Interface2(interface2Pointer);
Pointer interface3Pointer = ObtainInterface_callback.apply("Interface3");
Interface3 interface3 = new Interface3(interface3Pointer);
where the constructor of Interface1
would look like this (same for Interface2
and Interface3
):
public Interface1(Pointer peer) {
super(peer);
read();
}
Note: (in response for technomage's answer) The above code for Interface1
, 2 and 3 was automatically generated by JNAerator, in an attempt to map the C struct with functions to a Java object with callbacks.
We have managed to successfully integrate with the interop.dll
and the 3rd party tool.
When we switch to using the stub dll
, we are getting some IllegalStateException
coming from the JNA code (CallbackReference.java
@ line 122). The problem occurs when we attempt to obtain the third interface Interface3 interface3 = new Interface3(interface3Pointer);
We downloaded the JNA's sources and started debugging trough the code to see what exactly is causing the issue.
The read()
method (see constructor of Interface1
above) internally calls a readField()
method for all members of the mapped structure. Because all structure members are function pointers, readField
produces a Callback
instance (as in Pointer.java
@line 419), and latter result in a call to the native method long _getPointer(long addr)
. For those interested, the native method looks like this (I am not really sure this is relevant enough):
dispatch.c, @line 2359
/*
* Class: Native
* Method: _getPointer
* Signature: (J)Lcom/sun/jna/Pointer;
*/
JNIEXPORT jlong JNICALL Java_com_sun_jna_Native__1getPointer
(JNIEnv *env, jclass UNUSED(cls), jlong addr)
{
void *ptr = NULL;
MEMCPY(env, &ptr, L2A(addr), sizeof(ptr));
return A2L(ptr);
}
What we identified there was an issue with the address returned by the above _getPointer
call, while running with the stub.dll
. Here are the details we captured when debugging:
interface2Pointer
has value 402394304 (0x17FC0CC0)
, (the pointer of the C struct)readField
method discovers 10 function pointers within that struct, the last residing at offset 36
function10
-> interface2Pointer
+ offset
= 402394304
+ 36
= 402394340 (0x17FC0CE4)
._getPointer(interface2Pointer.function10)
= _getPointer(402394340)
which would return the address of the callback within the struct, currently 401814304 (0x17F33320)
.The same is repeated for interface3Pointer
interface3Pointer
-> 402397356 (0x17FC18AC)
0
and 4
, which are retrieved by readField
method:
function1
-> 402397356
+ 0
= 402397356 (0x17FC18AC)
interface3Pointer.function1
) = _getPointer(402397356
) then returns 402087408 (0x17F75DF0)
function2
-> 402397356
+ 4
= 402397360 (0x17FC18B0)
interface3Pointer.function2
) = _getPointer(402397360
) then returns 401814304 (0x17F33320)
(!)As you can see, the interface3Pointer.function2
is being assigned the same pointer as interface2Pointer.function10
.
Now, the CallbackReference.java
internally uses a weak hash map to keep track of callback pointers who have already been assigned to a Java representation, The IllegalStateException
is being thrown because that map still has a reference to the already matched pointer (interface2Pointer.function10
@ 401814304
), thus it is unable to insert it again and map it to another interface.
I can observe three problems from this point:
stub.dll
uses the same callback for both operations? This is rather surprising, as interface2Pointer.function10
has different signature than interface3Pointer.function2
.The above observations are consistent with subsequent retrials after restarting both the process and the host OS. We are even getting the same address pointers as the ones mentioned here on subsequent executions.
To make things worse, the 3rd party tool manufacturer claims there are no issues with both the interop.dll
and the stub.dll
that could cause the above behavior.
Update In response to comments, I am adding the signatures of the native functions here:
interface2.function10
:
void (__cdecl *function10)( CallbackWithFunction10EventInfo cb, void* userData );
interface3.function1
:
void (__cdecl *function1)(CallbackWithNoData cb, void* userData, int value );
interface3.function2
:
void (__cdecl *function2)(CallbackWithNoData cb, void* userData);
Signature Note
While the two methods obviously have different types for their first parameter cb
, it is not impossible the CallbackWithFunction10EventInfo
to be "hierarchically" related to CallbackWithNoData
(like some sort of faked inheritance, which is possible in certain circumstances in C). Could something like this impact the returned pointer values?
We also debugged the pointer values that are returned in case we remove the stub dll and use the working integration, with the interop.dll
and the real tool. Our java code is still the same.
interface2Pointer
-> 401508620 (0x17EE890C)
function10
-> interface2Pointer
+ offset
= 401508620
+ 36
= 401508656 (0x17EE8930)
.
_getPointer(interface2Pointer.function10)
= _getPointer(401508656)
= 400857536 (0x17E499C0)
.
interface3Pointer
-> 401508920 (0x17EE8A38)
function1
-> interface3Pointer
+ offset1
= 401508920
+ 0
= 401508920 (0x17EE8A38)
.
_getPointer(interface3Pointer.function1)
= _getPointer(401508920)
= 401018032 (0x17E70CB0)
.
function2
-> interface3Pointer
+ offset2
= 401508920
+ 4
= 401508924 (0x17EE8A3C)
.
_getPointer(interface3Pointer.function2)
= _getPointer(401508924)
= 401017424 (0x17E70A50)
Obviously, the non-stub addresses are unique, and we get the inter-operation working.
The code is being executed on a virtual machine with Microsoft Windows XP, and resides in a shaded jar. We use JDK/JRE 1.6 and JNA version 4.1.0.
Our test and execution scenarios provide 3 means of executing the Java process that does the interop binding:
stub.dll
IllegalStateException
with the stub.dll
.interface2
and interface3
bindings. The thing is working correctlyThe command line we use to start the child Java process in steps 2 and 3 is:
java -cp our-shaded.jar main.class.package.Application
and when debugging, we add -Xdebug -Xrunjdwp:transport=dt_socket,address=8998,server=y
Update
While just performing some additional assertions, it was worth to examine the pointers returned by the stub.dll
in case of a standalone process execution (as in point 1 above). The result was both confusing and gave us some direction. The Standalone process obtained unique pointers, in a similar way as if it was working with the real tool. Thus, the cause might be with the child process and some shared memory or limits to the memory exposed between the native code and the child Java process...
I would appreciate any clarity on whether the issue is caused by our usage or the stub dll itself (I would blame the latter). We may need to convince the third party manufacturer if there is indeed a problem with their code, otherwise we might not get a chance for a new version of the stub, meaning we should look for a workaround. So, any help in that direction, or workaround tips is welcome.
The intent of uniquely mapping function pointers to Callback references is to expose programmatic errors in callback mapping, as well as provide a method for automatically disposing of memory when native pointers go out of scope. Generally a C function pointer has a single acceptable signature (with the exception of varargs semantics and casting). The cleanup also becomes a bit more complex if a single native pointer maps to multiple Java objects.
It's possible that your native code is dynamically allocating the function pointers, in which case a particular pointer might end up being reused (especially if the native code is using an explicit memory pool). If that's the case, you would probably just need to purge the weak hash map (JNA does not expose this, but it would be trivial to call .size()
on the map in a bit of customized code).
The native code may also be using placeholder functions, where a placeholder or common function is reused (usually where the method signature is the same). If this is the case the error would be deterministic (which doesn't appear to be your case).
Alternatively, the native code may be using a single dispatch function (this doesn't sound like the case or you'd be seeing the error on every function pointer after the first).
I'd like to note that it would probably be much easier for you if you actually mapped the native struct
into JNA Structure
. That would avoid for you the manual extraction and initialization of interface pointers. JNA is perfectly capable of initializing a host of function pointers (i.e. Callbacks) within a Structure
.
UPDATE
Given that function10
and function2
have effectively the same signature, ((*)(), void*)
, your stub library may well be using a placeholder function (e.g. "_not_implemented"). If you're not actively using these functions, you can simply change them both to have the same interface (either an existing one or one you write). That would get around the JNA restriction.
Arguably JNA could drop this restriction, or provide a way around it, but this requires code changes within JNA. Even if it's a matter of the native code re-using a function pointer in a later (in time) context, you'd need to tweak JNA to be able to purposely flush the older mapping (assuming it's truly no longer in use).