Search code examples
hadoophivehdfsmahoutk-means

Moving clustered data from HDFS to Hive


I have been experimenting with Mahout in the Cloudera demo VM and have successfully clustered the sample synthetic control data (https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html) using the k-Means algorithm. I have used ClusterDumper and can view the Mahout output, but now I want to put the output into a Hive table. How would I go about doing this?


Solution

  • There is no direct integration. Your best bet is to modify ClusterDumper to produce some kind of textual representation that can be imported into Hive as tabular data.