Search code examples
apachehadoophivehbasekylin

What are Apache Kylin Use Cases?


I recently came across Apache Kylin, and was curious what it's use cases are. From what I can tell, it seems to be a tool designed to solve very specific problems related to upwards of 10+ billion rows, aggregating, caching and querying data from other sources (HBase, Hadoop, Hive). Am I correct in this assumption?


Solution

  • Apache Kylin's use case is interactive big data analysis on Hadoop. It lets you query big Hive tables at sub-second latency in 3 simple steps.

    1. Identify a set of Hive tables in star schema.
    2. Build a cube from the Hive tables in an offline batch process.
    3. Query the Hive tables using SQL and get results in sub-seconds, via Rest API, ODBC, or JDBC.

    The use case is pretty general that it can fast query any Hive tables as long as you can define star schema and model cubes from the tables. Check out Kylin terminologies if you are not sure what is star schema and what is cube.

    Kylin provides ANSI SQL interface, so you can query the Hive tables pretty much the same way you used to. One limitation however is Kylin provides only aggregated results, or in other word, SQL should contain a "group by" clause to yield correct result. This is usually fine because big data analysis focus more on the aggregated results rather than individual records.