Does anybody know a Java implementation of the DRMAA-API that is known to work with PBS/Torque cluster software?
The background behind this: I would like to submit jobs to a newly set-up linux cluster from Java using a DRMAA compliant API. The cluster is managed by PBS/Torque. Torque includes PBS DRMAA 1.0 library for Torque/PBS that contains a DRMA-C binding and provides in libdrmaa.so and .a binaries. I know that Sun grid engine includes a drmaa.jar providing a Java-DRMAA API. In fact I opted to use SGE but it was decided to try PBS first.
The theory behind that decision was:
'DRMAA is a standard and therefore a Java API needs only a standards compliant drmaa-c binding.' However, I couldn't find such 'general DRMAA-C-java API' and now assume that this assumption is wrong and that the Java libraries are engine specific.
Edit: I just experimented with the drmaa.jar from sun grid engine package and tried to cross-use it with the pbs libdrmaa.so. Not surprisingly, that failed (JNI unsatisfied link error).
Conclusion: It does not work that way! After some search I see only these few options:
Implement a drmaa binding myself. Way too complex...
Switch to Grid Engine. GE in my opinion is superior over PBS with respect to language bindings.
I tend to prefer option 2. or 4. Any recommendations?
After some more searching it looks like I have to write something myself. There seems to be not optimal answer yet, but it can serve as a warning for those attempting the same.
The best place to ask these questions is possibly the Torque mailing list: www.clusterresources.com/resources/mailing-lists.php
First of all, the reason why you cannot just use any DRMAA-Java library and use it with any DRMAA-C implementation is: DRMAA describes the interface of the resource control, not how it is implemented. The vendor could use a DRMAA-C implementation and use only these functions, but they do not have to. It can use whatever is there in the engine. So one important message is: if you need certain language bindings, make sure they are there for all languages required.
Regarding the options mentioned:
Using GridWay/Globus Toolkit: http://www.gridway.org/doku.php?id=start Advantage: Gridway is a meta scheduler that supports many resource management systems (SGE, PBS,...). Possibly, the only way to get a DRMAA interface to work with PBS at the moment. Disadvantage: It seems like an inflation of layers and complexity. Have no experience with that.
Using system commands, qsub, qstat, qdel. Advantage: quick hack Disadvantages: dirty hack, need to implement parsers for the output, might not notice if something goes wrong, pass around messages from stdin/stdout/stderr, not portable
Using JNI it should be possible to create a binding for each c-function in drmaa.c Advantage: would provide a full drmaa implementation (hopefully) Disadvanteges: involves compiled code, lot of manual wrapping of C-functions (maybe this can be automated)
Switch to another grid engine. Possibly, we should have done this analysis before. However, we already have one other Torque cluster, and there is experience with this. Operating two would create more heterogeneous infrastructure.
Changing an existing drmaa library from a different vendor. No idea if that is feasible... We will look into that too.