Parsing picocli-based CLI usage output into structured data

I have a set of picocli-based applications that I'd like to parse the usage output into structured data. I've written three different output parsers so far and I'm not happy with any of them (fragility, complexity, difficulty in extending, etc.). Any thoughts on how to cleanly parse this type of semi-structured output?

The usage output generally looks like this:

Usage: taker-mvo-2 [-hV] [-C=file] [-E=file] [-p=payoffs] [-s=millis] PENALTY
                    (ASSET SPREAD)...
Submits liquidity-taking orders based on mean-variance optimization of multiple
assets.
      PENALTY             risk penalty for payoff variance
      (ASSET SPREAD)...   Spread for creating market above fundamental value
                            for assets
  -C, --credential=file   credential file
  -E, --endpoint=file     marketplace endpoint file
  -h, --help              display this help message
  -p, --payoffs=payoffs   payoff states and probabilities (default: .fm/payoffs)
  -s, --sleep=millis      sleep milliseconds before acting (default: 2000)
  -V, --version           print product version and exit

I want to capture the program name and description, options, parameters, and parameter-groups along with their descriptions into an agent:

public class Agent {
    private String name;
    private String description = "";
    private List<Option> options;
    private List<Parameter> parameters;
    private List<ParameterGroup> parameterGroups;
}

The program name is taker-mvo-2 and the (possibly multi-lined) description is after the (possibly multi-line) arguments list:

Submits liquidity-taking orders based on mean-variance optimization of multiple assets.

Options (in square brackets) should be parsed into:

public class Option {
    private String shortName;
    private String parameter;
    private String longName;
    private String description;

}

The parsed options' JSON is:

options: [ {
  "shortName": "h",
  "parameter": null,
  "longName": "help",
  "description": "display this help message"
}, {
  "shortName": "V",
  "parameter": null,
  "longName": "version",
  "description": "print product version and exit"
}, {
  "shortName": "C",
  "parameter": file,
  "longName": "credential",
  "description": "credential file"
}, {
  "shortName": "E",
  "parameter": file,
  "longName": "endpoint",
  "description": "marketplace endpoint file"
}, {
  "shortName": "p",
  "parameter": payoffs,
  "longName": "payoffs",
  "description": "payoff states and probabilities (default: ~/.fm/payoffs)"
}]

Similarly for the parameters which should be parsed into:

public class Parameter {
    private String name;
    private String description;

}

and parameter-groups which are surrounded by ( and )... should be parsed into:

public class ParameterGroup {
    private List<String> parameters;
    private String description;

}

The first hand-written parser I wrote walked the buffer, capturing the data as it progresses. It works pretty well, but it looks horrible. And it's horrible to extend. The second hand-written parser uses regex expressions while walking the buffer. Better looking than the first but still ugly and difficult to extend. The third parser uses regex expressions. Probably the best looking of the bunch but still ugly and unmanageable.

I thought this text would be pretty simple to parse manually but now I'm wondering if ANTLR might be a better tool for this. Any thoughts or alternative ideas?

Solution

Model

It sounds like what you need is a model. An object model that describes the command, its options, option parameter types, option description, option names, and similar for positional parameters, argument groups, and potentially subcommands.

Then, once you have an object model of your application, it is relatively straightforward to render this as JSON or as some other format.

Picocli has an object model

You could build this yourself, but if you are using picocli anyway, why not leverage picocli's strengths and use picocli's built-in model?

Accessing picocli's object model

Commands can access their own model

Within a picocli-based application, a @Command-annotated class can access its own picocli object model by declaring a @Spec-annotated field. Picocli will inject the CommandSpec into that field.

For example:

@Command(name = "taker-mvo-2", mixinStandardHelpOptions = true, version = "taker-mvo-2 0.2")
class TakerMvo2 implements Runnable {
    // ...

    @Option(names = {"-C", "--credential"}, description = "credential file")
    File file;

    @Spec CommandSpec spec; // injected by picocli

    public void run() {
        for (OptionSpec option : spec.options()) {
            System.out.printf("%s=%s%n", option.longestName(), option.getValue());
        }
    }
}

The picocli user manual has a more detailed example that uses the CommandSpec to loop over all options in a command to see if the option was defaulted or whether a value was specified on the command line.

Creating a model of any picocli command

An alternative way to access picocli's object model is to construct a CommandLine instance with the @Command-annotated class (or an object of that class). You can do this outside of your picocli application.

For example:

class Agent {
    public static void main(String... args) {
        CommandLine cmd = new CommandLine(new TakerMvo2());
        CommandSpec spec = cmd.getCommandSpec();
        
        // get subcommands
        Map<String,CommandLine> subCmds = spec.subcommands();
        
        // get options as a list
        List<OptionSpec> options = spec.options()

        // get argument groups
        List<ArgGroupSpec> argGroups = spec.argGroups()

        ...
    }
}