Search code examples
javacommand-lineruntimewekaprocessbuilder

ProcessBuilder/Runtime.exec() with Weka Command Line Demonstrating Peculiar Behavior


Below is basically an MCVE of my full problem, which is much messier. What you need to know is that the following line runs when directly put in terminal:

java -classpath /path/to/weka.jar weka.filters.MultiFilter \
    -F "weka.filters.unsupervised.attribute.ClusterMembership -I first" \
    -i /path/to/in.arff

This is relatively straightforward. Basically, all I am doing is trying to cluster the data from in.arff using all of the default settings for the ClusterMembership filter, but I want to ignore the first attribute. I have the MultiFilter there because in my actual project, there are other filters, so I need this to stay. Like previously mentioned, this works fine. However, when I try to run the same line with ProcessBuilder, I get a "quote parse error", and it seems like the whole structure of nesting quotes breaks down. One way of demonstrating this is trying to get the following to work:

List<String> args = new ArrayList<String>();
args.add("java");
args.add("-cp"); 
args.add("/path/to/weka.jar");
args.add("weka.filters.MultiFilter");
args.add("-F");
args.add("\"weka.filters.unsupervised.attribute.ClusterMembership"); 
args.add("-I"); 
args.add("first\"");
args.add("-i"); 
args.add("/path/to/in.arff");
ProcessBuilder pb = new ProcessBuiler(args);

// ... Run the process below

At first glance, you might think this is identical to the above line (that's certainly what my naive self thought). In fact, if I just print args out with spaces in between each one, the resulting strings are identical and run perfectly if directly copy and pasted to the terminal. However, for whatever reason, the program won't work as I got the message (from Weka) Quote parse error. I tried googling and found this question about how ProcessBuilder adds extra quotes to the command line (this led me to try numerous combinations of escape sequences, all of which did not work), and read this article about how ProcessBuilder/Runtime.exec() work (I tried both ProcessBuilder and Runtime.exec(), and ultimately the same problem persisted), but couldn't find anything relevant to what I needed. Weka already had bad documentation, and then their Wikispace page went down a couple weeks ago due to Wikispaces shutting down, so I have found very little info on the Weka side.

My question then is this: Is there a way to get something like the second example I put above to run such that I can group arguments together for much larger commands? I understand it may require some funky escape sequences (or maybe not?), or perhaps something else I have not considered. Any help here is much appreciated.

Edit: I updated the question to hopefully give more insight into what my problem is.


Solution

  • You don't need to group arguments together. It doesn't even work, as you've already noted. Take a look what happens when I call my Java programm like this:

    java -jar Test.jar -i -s "-t 500"
    

    This is my "program":

    public class Test {
      public static void main(String[] args) {
        for( String arg : args ) {
          System.out.println(arg);
        }      
      }
    }
    

    And this is the output:

    -i
    -s
    -t 500
    

    The quotes are not included in the arguments, they are used to group the arguments. So when you pass the arguments to the ProcessBuilder like you did, it is essentially like you'd written them with quotes on the command line and they are treated as a single argument, which confuses the parser.

    The quotes are only necessary when you have nested components, e.g. FilteredClassifier. Maybe my answer on another Weka question can help you with those nested components. (I recently changed the links to their wiki to point to the Google cache until they established a new wiki.)

    Since you didn't specify what case exactly caused you to think about grouping, you could try to get a working command line for Weka and then use that one as input for a program like mine. You can then see how you would need to pass them to a ProcessBuilder.

    For your example I'd guess the following will work:

    List<String> args = new ArrayList<String>();
    args.add("java");
    args.add("-cp"); 
    args.add("/path/to/weka.jar");
    args.add("weka.filters.MultiFilter");
    args.add("-F");
    args.add("weka.filters.unsupervised.attribute.ClusterMembership -I first");
    args.add("-i"); 
    args.add("/path/to/in.arff");
    ProcessBuilder pb = new ProcessBuiler(args);
    

    Additional details

    What happens inside Weka is basically the following: The options from the arguments are first processed by weka.filters.Filter, then all non-general filter options are processed by weka.filters.MultiFilter, which contains the following code in setOptions(...):

    filters = new Vector<Filter>();
    while ((tmpStr = Utils.getOption("F", options)).length() != 0) {
        options2 = Utils.splitOptions(tmpStr);
        filter = options2[0];
        options2[0] = "";
        filters.add((Filter) Utils.forName(Filter.class, filter, options2));
    }
    

    Here, tmpStr is the value for the -F option and will be processed by Utils.splitOption(tmpStr) (source code). There, all the quoting and unquoting magic happens, so that the next component will receive an options array that looks just like it would look if it was a first-level component.