I downloaded the examples of latest version for chapter 09 of “Mahout in Action”. I can successfully run several examples, but for three files, NewsKMeansClustering.java, ReutersToSparseVectors.java, and NewsFuzzyKMeansClusteing.java. Running these three programs gives similar error messages:
Aug 3, 2011 2:03:54 PM org.apache.hadoop.metrics.jvm.JvmMetrics init INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/user1/workspaceMahout1/recommender/inputDir
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93) at mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:54)
For the above messages, I do not quite understand what do those two warnings mean? Moreover, it looks like that “input path” should have been created, how can I create this type of input? Thanks.
You can ignore the warnings. The error is that the input directory you have specified does not exist. Does it exist? What is your command line?