Vectorizing a solr index with mahout using lucene.vector

I'm trying to run a clustering job on Amazon EMR using Mahout. I have a solr index that I uploaded on S3 and I want to vectorize it using mahouts lucene.vector.(this is the first step in the job flow)

The parameters for the step are:

Jar: s3n://mahout-bucket/jars/mahout-core-0.6-job.jar
MainClass: org.apache.mahout.driver.MahoutDriver
Args: lucene.vector --dir s3n://mahout-input/solr_index/ --field name --dictOut /test/solr-dict-out/dict.txt --output /test/solr-vectors-out/vectors

The error in the log is:

Unknown program 'lucene.vector' chosen.

I've done the same process locally with hadoop and Mahout and it worked fine. How should I call the lucene.vector function on EMR?

Solution

I've eventually figured out the answer. The problem was I was using the wrong MainClass argument. Instead of

org.apache.mahout.driver.MahoutDriver

I should have used:

org.apache.mahout.utils.vectors.lucene.Driver

Therefore the correct arguments should have been

Jar: s3n://mahout-bucket/jars/mahout-core-0.6-job.jar MainClass:
org.apache.mahout.utils.vectors.lucene.Driver
Args: --dir s3n://mahout-input/solr_index/ --field name --dictOut /test/solr-dict-out/dict.txt --output /test/solr-vectors-out/vectors