Search code examples
macosterminalmallet

Trick to use file paths with spaces in Mallet (Terminal, OSx)?


Is there a trick to be able to use file paths with spaces in Mallet through the terminal on mac?

For example, all of the following give me errors:

escaping the space

./bin/mallet import-dir  --input /Volumes/Macintosh\ HD/Users/MY_NAME/Desktop/en --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

double quotes, no escapes

./bin/mallet import-dir --input "/Volumes/Macintosh HD/Users/MY_NAME/Desktop/en" --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

and, with double quotes

./bin/mallet import-dir --input "/Volumes/Macintosh\ HD/Users/MY_NAME/Desktop/en" --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

and finally with single quotes

./bin/mallet import-dir --input '/Volumes/Macintosh\ HD/Users/MY_NAME/Desktop/en' --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

They all want to treat the folder as multiple folders, split on the space:

Labels = 
   /Volumes/Macintosh\
   HD/Users/MY_NAME/Desktop/en
Exception in thread "main" java.lang.IllegalArgumentException: /Volumes/Macintosh\ is not a directory.
    at cc.mallet.pipe.iterator.FileIterator.<init>(FileIterator.java:108)
    at cc.mallet.pipe.iterator.FileIterator.<init>(FileIterator.java:145)
    at cc.mallet.classify.tui.Text2Vectors.main(Text2Vectors.java:322)

Is there anyway around this, other than renaming all of my files with spaces to underscores? (I understand that I don't need to type /Volumes/Macintosh\ HD/... but can just start at /Users. This was just an example.)


Solution

  • The issue is that import-dir is designed to take multiple directories as input. The argument parser would need a way to distinguish this use case from the "escaped space" use case, keeping in mind that Windows paths can end in \.

    The best way to support both cases might be to add a --single-input option that would take its argument as a single string.

    I also find that the spreadsheet-style import-file command is almost always preferable to working with directories.