I use Spark 1.6.
We have a HDFS write method that wrote to HDFS using SqlContext
. Now we needed to switch over to using HiveContext
. When we did that existing unit tests do not run and give the error
Error XSDB6: Another instance of Derby may have already booted the database <local path>\metastore_db
This happens whether I run a single test via IntelliJ test runner or via maven on the command line.
As I understand the issue happens when multiple HiveContexts or multiple processes are trying to access the metastore_db. However I am running a single test and no other jobs on my local machine so I fail to understand where the multiple processes are coming from
Figured out why I was getting an error. In the unit test we were writing data to ORC on the local file system and then reading to verify the write was done properly.
The write and read methods were creating their own HiveContexts in the same process which resulted in the lock on the metastore. I am guessing that when it was SqlContext it wasn't a blocker since a local metastore was not needed.
We have now moved to creating the HiveContext when we construct our persistence service. Semantically that makes more sense. This option was chosen over creating and destroying a new SparkContext (and thereby a new HiveContext) for every test since that would add considerable overhead to our test suite without providing much benefit (please do correct me if you have a different opinion)