i am successfully running Parallel FPGroth Algorithm of Apache mahout on top of hadoop. But the generetaed output text files are not readable as you can see below
SEQorg.apache.hadoop.io.TextDorg.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns������3G9��y'����e�����1���2�����������1���������t�5�1���������t�4�1�����������1�4227�����������3�1�����������1�3476���������t�1�1340���������h�1�5795���������N�1�2701���������K�1�3610���������@�1�2106�������� ...
Running RecommenderJob and ItemSimilarityJob with the same input file generates correct output files.
Any ideas?
These output files are sequence files, not text files. They contains key/value pairs of type <Text, TopKStrinPatterns>
You can get hadoop to read the sequence files and output the textual versions of these objects using the fs shell command, combined with -text and -libjars:
hadoop fs -libjars /path/to/mahout/lib.jar -text /path/to/hdfs/output/part*
If you want these files to be text rather than sequence then you'll need to amend the driver which runs the job, and change the job to use TextOutputFormat
, rather than SequenceFileOutputFormat
:
// job.setOutputFormat(SequenceFileOutputFormat.class);
job.setOutputFormat(TextOutputFormat.class);