Search code examples
hadoophdfshawqnosql

Insert streaming data to hawq


How to insert streaming data to hawq and execute query on online data.

  1. I teste jdbc insert and performance was very bad.

  2. After that i tested writing data to hdfs with flume and created external table in hawq, but hawq can't read data until flume close the file. the problem is that if i set flume file rolling very low (1 min) after some days number of files goes up and this is not good for hdfs.

  3. Third solution is hbase, but because most of my queries are aggregation on many data, hbase is not a good solution(hbase is good for getting single data).

So with these constraints, what is a good solution to query streaming data online with hawq?


Solution

  • if your source data is not on hdfs, you can try gpdfist/named pipe as a buffer with gpfdist external table or web external table using other linux scripts. another solution will be spring xd gpfdist module. http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#gpfdist