Search code examples
linuxwindowsmallet

Mallet works in Linux but not Windows


OK I'm trying to use Mallet to classify some documents in Windows

I've achieved it in Linux. Just can't get it do the job in Windows (target environment)

I've imported the data into a .mallet file.

And then created a classifier using this input data.

-rw-r--r-- 1 henry henry 15197116 Feb 23 15:56 nntp.classifier

and

07/03/2014  21:28        15,197,116 nntp.classifier

However when I run in Linux:

bin/mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier

it iterates any files in the testfolder and dumps out what class it thinks each it.

But if I run same command in Windows:

bin\mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier

It just dumps out the command list:

Mallet 2.0 commands:
  import-dir        load the contents of a directory into mallet instances (one per file)
  import-file       load a single file into mallet instances (one per line)
  import-svmlight   load a single SVMLight format data file into mallet instances (one per line)
  train-classifier  train a classifier from Mallet data files
  train-topics      train a topic model from Mallet data files
  infer-topics      use a trained topic model to infer topics for new documents
  estimate-topics   estimate the probability of new documents given a trained model
  hlda              train a topic model using Hierarchical LDA
  prune             remove features based on frequency or information gain
  split             divide data into testing, training, and validation portions
Include --help with any option for more information

Something that I did notice: I

f I run bin/mallet classify-dir --help in linux I get the help file i.e. descriptions of each command, but the same thing in Windows bin\mallet classify-dir --help does not produce the same result - just that command list above... (it does the same thing if you enter junk as the command)

Whereas one of the earlier command e.g. bin/mallet import-dir --help and bin\mallet import-dir --help produces the same full help file output.


Solution

  • there's a problem whit mallet.bat file in bin directory. You should modify it in :

    @echo off
    
    rem This batch file serves as a wrapper for several
    rem  MALLET command line tools.
    
    if not "%MALLET_HOME%" == "" goto gotMalletHome
    
    echo MALLET requires an environment variable MALLET_HOME.
    goto :eof
    
    :gotMalletHome
    
    set MALLET_CLASSPATH=%MALLET_HOME%\class;%MALLET_HOME%\lib\mallet-deps.jar
    set MALLET_MEMORY=1G
    set MALLET_ENCODING=UTF-8
    
    set CMD=%1
    shift
    
    set CLASS=
    if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
    if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
    if "%CMD%"=="import-smvlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
    if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
    if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
    if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
    if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.Vectors2Topics
    if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
    if "%CMD%"=="estimate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
    if "%CMD%"=="hlda" set CLASS=cc.mallet.topics.tui.HierarchicalLDATUI
    if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
    if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
    if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
    if "%CMD%"=="run" set CLASS=%1 & shift
    
    if not "%CLASS%" == "" goto gotClass
    
    echo Mallet 2.0 commands:
    echo   import-dir        load the contents of a directory into mallet instances (one per file)
    echo   import-file       load a single file into mallet instances (one per line)
    echo   import-svmlight   load a single SVMLight format data file into mallet instances (one per line)
    echo   train-classifier  train a classifier from Mallet data files
    echo   classify-dir      classify the contents of a directory with a saved classifier
    echo   classify-file     classify a file with a saved classifier
    echo   train-topics      train a topic model from Mallet data files
    echo   infer-topics      use a trained topic model to infer topics for new documents
    echo   estimate-topics   estimate the probability of new documents given a trained model
    echo   hlda              train a topic model using Hierarchical LDA
    echo   prune             remove features based on frequency or information gain
    echo   split             divide data into testing, training, and validation portions
    echo Include --help with any option for more information
    
    
    goto :eof
    
    :gotClass
    
    set MALLET_ARGS=
    
    :getArg
    
    if "%1"=="" goto run
    set MALLET_ARGS=%MALLET_ARGS% %1
    shift
    goto getArg
    
    :run
    
    java -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
    
    :eof
    

    for being able to classify in Windows environments.

    I hope this can help.

    Ignazio