Search code examples
hadoopstreamingibm-mqflume

Using flume to read IBM MQ data


I want to read data from IBM MQ and put it into HDFs.

Looked into JMS source of flume, seems it can connect to IBM MQ, but I’m not understanding what does “destinationType” and “destinationName” mean in the list of required properties. Can someone please explain?

Also, how I should be configuring my flume agents

flumeAgent1(runs on the machine same as MQ) reads MQ data ---- flumeAgent2(Runs on Hadoop cluster) writes into Hdfs OR only one agent is enough on Hadoop cluster

Can someone help me in understanding how MQs can be integrated with flume

Reference

https://flume.apache.org/FlumeUserGuide.html

Thanks, Chhaya


Solution

  • Regarding the Flume agent architecture, it is composed in its minimalist form by a source in charge of receiving or polling for events, and converting the events into Flume events that are put in a channel. Then, a sink takes those events in order to persist the data somewhere, or send the data to another agent. All these components (source, channel, sink, i.e. an agent) run in the same machine. Different agents may be distributed, instead.

    Being said that, your scenario seems to require a single agent based on a JMS source, a channel, typically Memory Channel, and a HDFS sink.

    The JMS source, as stated in the documentation, has only been tested for ActiveMQ, but shoukd work for any other queue systemm. The documentation also provides an example:

    a1.sources = r1
    a1.channels = c1
    a1.sources.r1.type = jms
    a1.sources.r1.channels = c1
    a1.sources.r1.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactory
    a1.sources.r1.connectionFactory = GenericConnectionFactory
    a1.sources.r1.providerURL = tcp://mqserver:61616
    a1.sources.r1.destinationName = BUSINESS_DATA
    a1.sources.r1.destinationType = QUEUE
    

    a1 is the name of the single agent. c1 is the name for the channel and its configuration must be still completed; and a sink configuration is totally missing. It can be easily completed by adding:

    a1.sinks = k1
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.channel = c1
    a1.sinks.k1.hdfs.path = ...
    a1.sinks.k1...
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 10000
    a1.channels.c1...
    

    r1 is the JMS source, and as can be seen, destinationName simply ask for a string name. destinationType can only take two values: queue or topic. I think the important parameters are providerURL and initialContextFactory and connectionFactory, which must be adapted for IBM MQ.