My idea is to use Spark Streaming + Kafka to get the events from the kafka bus. After retrieving a batch of avro-encoded events I would like to transform them with Spark Avro into SparkSQL Dataframes and then write the dataframes to a Hive Table.
Is this approach feasable? I am new to spark and I am not totally sure, if I can use the Spark Avro package for decoding the Kafka Events, since in the documentation only avro files are mentioned. But my understanding so far is, that it would be possible.
The next question is: if this is possible, my understanding is, that I have a SparkSQL conforming Dataframe, which I could write to a hive table. Are my assumptions correct?
Thanks in advance for any hints and tips.
Yes you would be able to do that http://aseigneurin.github.io/2016/03/04/kafka-spark-avro-producing-and-consuming-avro-messages.html
It is possible to save datasets as hive tables or write the data in orc format.You can also write the data in required format in hdfs and create an external hive table on top of that