Search code examples
javamahout

Errors for running Mahout example


I downloaded the examples of latest version for chapter 09 of “Mahout in Action”. I can successfully run several examples, but for three files, NewsKMeansClustering.java, ReutersToSparseVectors.java, and NewsFuzzyKMeansClusteing.java. Running these three programs gives similar error messages:

Aug 3, 2011 2:03:54 PM org.apache.hadoop.metrics.jvm.JvmMetrics init INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=

Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.

Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/user1/workspaceMahout1/recommender/inputDir

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)

at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)

at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)

at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93) at mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:54)

For the above messages, I do not quite understand what do those two warnings mean? Moreover, it looks like that “input path” should have been created, how can I create this type of input? Thanks.


Solution

  • You can ignore the warnings. The error is that the input directory you have specified does not exist. Does it exist? What is your command line?