OK I'm trying to use Mallet to classify some documents in Windows
I've achieved it in Linux. Just can't get it do the job in Windows (target environment)
I've imported the data into a .mallet file.
And then created a classifier using this input data.
-rw-r--r-- 1 henry henry 15197116 Feb 23 15:56 nntp.classifier
and
07/03/2014 21:28 15,197,116 nntp.classifier
However when I run in Linux:
bin/mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier
it iterates any files in the testfolder and dumps out what class it thinks each it.
But if I run same command in Windows:
bin\mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier
It just dumps out the command list:
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load a single SVMLight format data file into mallet instances (one per line)
train-classifier train a classifier from Mallet data files
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
estimate-topics estimate the probability of new documents given a trained model
hlda train a topic model using Hierarchical LDA
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
Include --help with any option for more information
Something that I did notice: I
f I run bin/mallet classify-dir --help
in linux I get the help file i.e. descriptions of each command, but the same thing in Windows bin\mallet classify-dir --help
does not produce the same result - just that command list above... (it does the same thing if you enter junk as the command)
Whereas one of the earlier command e.g. bin/mallet import-dir --help
and bin\mallet import-dir --help
produces the same full help file output.
there's a problem whit mallet.bat file in bin directory. You should modify it in :
@echo off
rem This batch file serves as a wrapper for several
rem MALLET command line tools.
if not "%MALLET_HOME%" == "" goto gotMalletHome
echo MALLET requires an environment variable MALLET_HOME.
goto :eof
:gotMalletHome
set MALLET_CLASSPATH=%MALLET_HOME%\class;%MALLET_HOME%\lib\mallet-deps.jar
set MALLET_MEMORY=1G
set MALLET_ENCODING=UTF-8
set CMD=%1
shift
set CLASS=
if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
if "%CMD%"=="import-smvlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.Vectors2Topics
if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
if "%CMD%"=="estimate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
if "%CMD%"=="hlda" set CLASS=cc.mallet.topics.tui.HierarchicalLDATUI
if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
if "%CMD%"=="run" set CLASS=%1 & shift
if not "%CLASS%" == "" goto gotClass
echo Mallet 2.0 commands:
echo import-dir load the contents of a directory into mallet instances (one per file)
echo import-file load a single file into mallet instances (one per line)
echo import-svmlight load a single SVMLight format data file into mallet instances (one per line)
echo train-classifier train a classifier from Mallet data files
echo classify-dir classify the contents of a directory with a saved classifier
echo classify-file classify a file with a saved classifier
echo train-topics train a topic model from Mallet data files
echo infer-topics use a trained topic model to infer topics for new documents
echo estimate-topics estimate the probability of new documents given a trained model
echo hlda train a topic model using Hierarchical LDA
echo prune remove features based on frequency or information gain
echo split divide data into testing, training, and validation portions
echo Include --help with any option for more information
goto :eof
:gotClass
set MALLET_ARGS=
:getArg
if "%1"=="" goto run
set MALLET_ARGS=%MALLET_ARGS% %1
shift
goto getArg
:run
java -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
:eof
for being able to classify in Windows environments.
I hope this can help.
Ignazio