Search code examples
hadoopclouderaoozie

How to expose Hadoop job and workflow metadata using Hive


What I would like to do is make workflow and job metadata such as start date, end date and status available in a hive table to be consumed by a BI tool for visualization purposes. I would like to be able to monitor for example if a certain workflow fails on certain hours, success rate, ...

For this purpose I need access to the same data Hue is able to show in the job browser and Oozie dashboard. What I am looking for specifically for workflows for example is the name, submitter, status, start and end time. The reason that I want this is that in my opinion this tool lacks a general overview and good search. The idea is that once I locate this data I will directly -or trough some processing steps- load it into Hive.

Questions that I would like to see answered:

  1. Is this data stored in HDFS or is it scattered in local data nodes?
  2. If it is stored in HDFS. Where can I find it? If it is stored in local data nodes, how does Hue find and show this?
  3. Assuming I can access the data. In what format would I expect this data. Is this stored in general log files or can I expect somewhat structured data?

I am using CDH 5.8


Solution

  • If jobs are submitted through other ways than Oozie , my approach won't be helpful.

    We have collected all the logs from the oozie server through the Oozie Java API and iterated over the coordinator information to get the required info.

    You need to think, what kind of information you need to retrieve.

    1. If you have all jobs submitted through Bundle then come from bundle to coordinator then to workflow to find out the info.
    2. If you want to get all the coordinator info then simply call the api with the number of coordinator to bring and fetch required info.

    And then we have loaded the fetched result into a hive table and there one can filter results for failed or time out coordinators & various other parameters.

    You can start looking into the example given from Oozie site:- https://oozie.apache.org/docs/3.2.0-incubating/DG_Examples.html#Java_API_Example]