Search code examples
javascalacygwincommand-line-argumentsglob

trailing asterisks on windows JVM command-line args are globbed in cygwin bash shell


UPDATE: this problem occurs when running JVM-based command line tools in a cygwin bash shell. Although I originally thought this was related to Scala, it's specific to the Windows JVM. It might be the result of breaking changes in MSDN libraries, see comments below.

I'm writing a scala utility script that takes a literal java classpath entry and analyzes it. I'd like my main method to be able to receive command line arguments with a trailing asterisk, e.g, "/*", but there seems to be no way to do it when running in a cygwin bash session.

Here's my scala test script, which displays command line arguments:

# saved to a file called "dumpargs.sc"
args.foreach { printf("[%s]\n",_) }

I'd like to be able to call it with an asterisk as an argument, like this:

scala -howtorun:script dumpargs.sc "*"

When I run this in a CMD.EXE shell, it does what I expect:

c:\cygwin> scala.bat -howtorun:script dumpargs.sc "*"
arg[*]
c:\cygwin>

Likewise, when tested in a Linux bash shell, the sole command line argument consists of a single bare asterisk, again as expected.

A comparable command-line args dumper program written in C prints a single bare asterisk, regardless of which shell it is run from (CMD.EXE or bash).

But when the same test is run in a cygwin bash shell, the asterisk is globbed, listing all the files in the current directory. The globbing happens somewhere downstream from by bash, since otherwise, the C dumper program would have also failed.

The problem is subtle, it happens somewhere in the JVM after it receives the asterisk argument and before the JVM calls the main method. But the JVM only globs the asterisk based on something in the running shell environment.

In some ways, this behaviour is a good thing, since it supports script-portability, by hiding differences in the runtime environments, Windows versus Linux/OSX, etc (unix-like shells tend to glob, whereas CMD.EXE doesn't).

All efforts to work around the problem so far have failed:

Even if I'm allow for os-dependent tricks, I've tried all of the following (from a bash session):

"*" '*' '\*' '\\*'

The following almost works, but the half-quotes arrive as part of the argument value and must then been stripped away by my program:

"'*'"

Same problem, but different kind of unwanted quotes get through:

'"*"' or \"*\"

What's needed is a system property, or some other mechanism to disable globbing.

By the way, one variation of this problem is the inability to take advantage of the nice way a directory of jar files can be added to the classpath (since java 1.6), by specifying "-classpath 'lib/*'".

There needs to be a system property I can set to disable this behavior when running in a shell environment that provide its' own globbing.


Solution

  • This problem is caused by a known bug in the JVM, documented here:

    https://bugs.openjdk.java.net/browse/JDK-8131329

    In the meantime, to get around the problem, I'm passing arguments via an environment variable.

    Here's what happens inside my "myScalaScript":

    #!/usr/bin/env scala
    for( arg <- args.toList ::: cpArgs ){
      printf("[%s]\n",arg)
    }
    
    lazy val cpArgs = System.getenv("CP_ARGS") match {
      case null => Nil
      case text => text.split("[;|]+").toList
    }
    

    Here's how the script is invoked from bash: CP_ARGS=".|./lib/*" myScalaScript [possibly other-non-problematic-args]

    and here's what it prints in all tested environments:

    [.]
    [./lib/*]
    

    Here's a better fix, that hides all the nastiness inside the script, and is a bit more conventional in the main loop.

    The new script:

    #!/bin/bash
    export CP_ARGS="$@"
    exec $(which scala) "$@"
    !#
    // vim: ft=scala
    
    for( arg <- cpArgs ){
      printf("[%s]\n",arg)
    }
    
    lazy val cpArgs = System.getenv("CP_ARGS") match {
      case null => Nil
      case text => text.split("[;|]+").toList
    }