Search code examples
hadoopresourcesmetamodel

Metamodel hadoop - HdfsResource issue with ExcelDataContext


A little background:
My program was used to support normal File System, reading Excel file (and do some action with the data) with ExcelDataContext. It was fine for all the action.

Now:
My program has to be modify to support and work together with Hadoop Oozie. Using Metamodel-hadoop 4.3.5 version, I manage to read the excel file from hadoop using HdfsResource. ExcelDataContext able to support Resource and it is fine to do the same job as before.

The Problem:
However, my oozie job workflow was not able to end/complate even after my program has finish running. After a few hours of debugging, I found out it was the issue of using HdfsResource to init ExcelDataContext.

Here is how I define HdfsResource.

...
Resource hdfsResource = new HdfsResource(hdfsExcelFilePath);
ExcelDataContext dc = new ExcelDataContext(hdfsResource, excelConfiguration);
....

If I comment out the hdfsResource line and use local file system, the program is able to complete / end the process without problem.

I suspect the resource was not close / end properly, but I was not able to close / end it even if I set it to NULL. Is there anyway to solve this? There are no close function.

hdfsResource = null;


Solution

  • After 2 days of investigation and researching, I found out I have gone into the wrong direction. It wasn't the issue which Resource doesn't close properly, but in the opposite way. The "HdfsResource" has close HDFS properly, which causes Oozie workflow fail to get HDFS due to connection closed.

    The solution: I have clone a same copy of HDFSResource in MetaModel-Hadoop, which modified all the functions with FileHelper.SafeClose(fs) to comment it out. I not sure if this is the right way to do it, but my intention is to allow HDFS connection is always available until Oozie end the connection itself when it finished its job.