I'm trying to work with Cascading to create and execute complex data processing workflows on a local Hadoop cluster.
I wish to create a TFIDF vector so I can apply Machine Learning algorithms such as NaiveBayes on it using the Apache Spark framework.
The problem is that after I create the jar and I launch it using the following commands the program freezes. Here is the log file.
You can find the sources here. The related source code is in part6.
Thanks!
I have found the problem. The nodes of the cluster were unhealthy but the log doesn't show that and cascading freezes as it's task has been UNASSIGNED.
So to solve the problem you have to correct the nodes health in my case I just had to correct hadoop-yarn containers directory and also it's local namenode directory.
You might run into other errors, So I suggest that you check your hadoop log files and the WebUI admin for Hadoop Nodes.