Search code examples
mahout

Can't figure out error while running CVB0Driver in Mahout


I've been trying for the last few hours to get CVB0Driver working and after much trial and error I've come to the following error which I can't figure out. (Using mahout-integration 0.7)

java.lang.Error: Unresolved compilation problem: 
at org.apache.mahout.math.function.Functions.mult(Functions.java:770)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:139)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:113)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:108)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:92)
at org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.setup(CachingCVB0Mapper.java:103)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

Here's the code I'm using, since I have yet to get it working I'm not sure if I'm on the right path, so feel free to comment if you see a mistake I'm making.

String [] args = {"-c","UTF-8","-i",input,"-o",output};

//create the seq file from the directory of text documents
ToolRunner.run(new SequenceFilesFromDirectory(),args);

//tokenize the documents
DocumentProcessor.tokenizeDocuments(new Path(inputDir), analyzer.getClass().asSubclass(Analyzer.class), tokenizedPath, conf);

//create tf vectors
DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,new Path(outputDir), DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER, conf, minSupport, maxNGramSize, minLLRValue, -1.0f, true, reduceTasks, chunkSize, sequentialAccessOutput, true);

//calculate the document frequencies 
Pair<Long[], List<Path>> dfData = TFIDFConverter.calculateDF( new Path(outputDir, DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new Path(outputDir), conf, chunkSize);

//create tfidf vectors
TFIDFConverter.processTfIdf( new Path(outputDir , DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new Path(outputDir), conf, dfData, minDf, maxDFPercent, norm, true, sequentialAccessOutput, true, reduceTasks);

args = new String[]{"-i","tfidf-vectors/part-r-00000","-o","cvb"};

//create the matrix for cvb
RowIdJob.main(args);

CVB0Driver.run(conf, new Path("cvb/matrix"), mto, numTopics, numTerms, alpha, eta, maxIterations, iterationBlockSize, convergenceDelta, dictionaryPath, dto, msto, randomSeed, testFraction, numTrainThreads, numUpdateThreads, maxItersPerDoc, numReduceTasks, backfillPerplexity);

Any help would be much appreciated.


Solution

  • Okay, seems this was some conflict between maven/eclipse projects.

    I had recently imported the mahout-integration 0.7 source into eclipse and somehow badly built it, there was issues with mahout-math and my other project maybe started referencing the badly built jar, I'm not too familiar with maven so I don't know if that was the case or eclipse just went a bit crazy.

    After deleting this project from eclipse, everything started to run fine.

    This question helped resolve this one - java-unresolved-compilation-problem