Search code examples
logginghivearchitecturebatch-processing

Logging & Monitoring for Hive Batch Jobs


This is my first question in this forum. I am writing hive batch job logs into a hive log table as-soon-as each step completed. I am using INSERT INTO TABLE for writing logs into hive table. In hive, multiple records are created for each batch job ID, so I am creating a View to combine logging data collected before using in monitoring tool. Can you please suggest any better solution to achieve this?

Notes:

  1. My batch job having multiple steps and I like to collect logs from each step
  2. I don't want to use UPDATE
  3. I am unable to upload image. Batch Job -> Logs -> Hive -> Monitoring

Solution

  • Here is one of the reference architecture I can suggest. You can still use Hive for logging, but use SERDEPROPERTIES to integrate with HBase.

    Benefits:

    • Data will be stored in HBase, which will allow to decide a KEY for data override (example: Batch Job ID)
    • HBase will maintain the versions
    • You can able to query Hive the way you normally access Hive tables
    • Real-time dashboard using HBase data

    High-Level Diagram: enter image description here