mysql apache-spark hadoop hive cloudera-quickstart-vm

Hive Meta Store Failure on Cloudera QuickStart VM 5.12 with Cloudera Manager

Cloudera claim to have a Quick Start approach. That is not working for me I note.

When I invoke spark-shell I get:

... WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version

I find it confusing, this is after all a Quick Start and this looks odd.

So:

I see that there is mysql running with metastore db. I can access this fine.
Do I need to start hive metastore if using mysql as hive metastore? I think so, but ...
Do I need hive server 2 now to run locally? Or can I run without?
The Cloudera Manager on the Hive Tab tells me I am using mysql and I see an auto generated hive-site.xml.

In short I am not sure how nto proceed to fix this. One of the logs is talking about failure to create derby e.g. ...

Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.

In short I am seeking guidance on how to fix this.

Before one of the numerous crashes I have had, I had an sbt assembly of SPARK / SCALA working fine accessing a remote MYSQL db, so I am wondering if that is the way to go and that the spark-shell and the local Cloudera VM are all to unstable.

Seeking guidance amidst frustration. Data Bricks works like a dream.

Thanks in advance.

Solution

Install 5.13, other problems but these ones disappear. Noted however what the cause is.

When a clean install is done and

sudo jps

executed, then all Hadoop services are fine and working. Checked this.

What is then noted is that the Cloudera Manager Console (CMS) never shows. Advice on Internet is to execute the command to invoke CM Express.

Once you do that, then the CMS shows, but many Hadoop Services need to be (re-)started. Point then is that spark-shell goes haywire and the metastore no longer accessible. All in all a sorry mess for which the solution is not so obvious.

Manual install of Hadoop may well be the best option, but a definitive integrated spec is needed. Then also have issues with Spark 2.x not being supported and KUDU not there, parcel vs. packages.