Using flume to read IBM MQ data

I want to read data from IBM MQ and put it into HDFs.

Looked into JMS source of flume, seems it can connect to IBM MQ, but I’m not understanding what does “destinationType” and “destinationName” mean in the list of required properties. Can someone please explain?

Also, how I should be configuring my flume agents

flumeAgent1(runs on the machine same as MQ) reads MQ data ---- flumeAgent2(Runs on Hadoop cluster) writes into Hdfs OR only one agent is enough on Hadoop cluster

Can someone help me in understanding how MQs can be integrated with flume

Reference

https://flume.apache.org/FlumeUserGuide.html

Thanks, Chhaya

Solution

Regarding the Flume agent architecture, it is composed in its minimalist form by a source in charge of receiving or polling for events, and converting the events into Flume events that are put in a channel. Then, a sink takes those events in order to persist the data somewhere, or send the data to another agent. All these components (source, channel, sink, i.e. an agent) run in the same machine. Different agents may be distributed, instead.

Being said that, your scenario seems to require a single agent based on a JMS source, a channel, typically Memory Channel, and a HDFS sink.

The JMS source, as stated in the documentation, has only been tested for ActiveMQ, but shoukd work for any other queue systemm. The documentation also provides an example:

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = jms
a1.sources.r1.channels = c1
a1.sources.r1.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactory
a1.sources.r1.connectionFactory = GenericConnectionFactory
a1.sources.r1.providerURL = tcp://mqserver:61616
a1.sources.r1.destinationName = BUSINESS_DATA
a1.sources.r1.destinationType = QUEUE

a1 is the name of the single agent. c1 is the name for the channel and its configuration must be still completed; and a sink configuration is totally missing. It can be easily completed by adding:

a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = ...
a1.sinks.k1...
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1...

r1 is the JMS source, and as can be seen, destinationName simply ask for a string name. destinationType can only take two values: queue or topic. I think the important parameters are providerURL and initialContextFactory and connectionFactory, which must be adapted for IBM MQ.