Search code examples

Flume NoSuchMethodError pulling Twitter data into HDFS

I can't manage to pull Twitter data using Flume into HDFS due to an error I cant't get rid of.

command :

bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent

console :

2020-12-14 11:38:08,662 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$] Unhandled error
java.lang.NoSuchMethodError: 'boolean twitter4j.conf.Configuration.isStallWarningsEnabled()'
    at twitter4j.TwitterStreamImpl.<init>(
    at twitter4j.TwitterStreamFactory.<clinit>(
    at org.apache.flume.source.twitter.TwitterSource.configure(
    at org.apache.flume.conf.Configurables.configure(
    at org.apache.flume.node.AbstractConfigurationProvider.loadSources(
    at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(
    at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$
    at java.base/java.util.concurrent.Executors$
    at java.base/java.util.concurrent.FutureTask.runAndReset(
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.base/java.util.concurrent.ThreadPoolExecutor$
    at java.base/ : I added manually the flume-sources-1.0-SNAPSHOT.jar into flume/lib.

 export JAVA_HOME=/usr/lib/jvm/default-java
 export JAVA_OPTS="-Xms500m -Xmx2000m"
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "


twitter.conf :

# Naming the components on the current agent. 
TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS
# Describing/Configuring the source 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx 
TwitterAgent.sources.Twitter.accessToken = xxx 
TwitterAgent.sources.Twitter.accessTokenSecret = xxx
TwitterAgent.sources.Twitter.keywords = tutorials point,java, bigdata, mapreduce, mahout, hbase, nosql
# Describing/Configuring the sink 

TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 
TwitterAgent.sinks.HDFS.hdfs.minBlockReplicas = 1
# Describing/Configuring the channel 
TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 100 
TwitterAgent.channels.MemChannel.transactionCapacity = 100
# Binding the source and sink to the channel 
TwitterAgent.sources.Twitter.channels = MemChannel = MemChannel

OS: Ubuntu Flume: v1.9.0 Hadoop: v3.3.0


  • I managed to make it works. For those who want to know, please read this.

    Firstly, change the Flume version. I use now flume 1.7.0 But maybe a newer version would work, I don't want to break it down :)

    Secondly, clone this repo Inside, there is a flume.conf file. I configured it like that :

     # Licensed to the Apache Software Foundation (ASF) under one
    # or more contributor license agreements.  See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership.  The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License.  You may obtain a copy of the License at
    # Unless required by applicable law or agreed to in writing,
    # software distributed under the License is distributed on an
    # KIND, either express or implied.  See the License for the
    # specific language governing permissions and limitations
    # under the License.
    # The configuration file needs to define the sources, 
    # the channels and the sinks.
    # Sources, channels and sinks are defined per agent, 
    # in this case called 'TwitterAgent'
    TwitterAgent.sources = Twitter
    TwitterAgent.channels = MemChannel
    TwitterAgent.sinks = HDFS
    TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
    TwitterAgent.sources.Twitter.channels = MemChannel
    TwitterAgent.sources.Twitter.consumerKey = xx
    TwitterAgent.sources.Twitter.consumerSecret = xx
    TwitterAgent.sources.Twitter.accessToken = xx
    TwitterAgent.sources.Twitter.accessTokenSecret = xx
    TwitterAgent.sources.Twitter.keywords =  hadoop, bigdata
    TwitterAgent.sources.Twitter.locations = -54.5247541978, 2.05338918702, 9.56001631027, 51.1485061713
    TwitterAgent.sources.Twitter.language = fr = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/twitter_data/%Y/%m/%d/%H/
    #It specifies the File format. File formats that are currently supported are SequenceFile, DataStream or CompressedStream.
    #The DataStream will not compress the output file and please don’t set codeC. The CompressedStream requires set hdfs.codeC with an available codeC
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    # It specifies the suffix to append to file. For  eg, .avro 
    TwitterAgent.sinks.HDFS.hdfs.fileSuffix = .json
    #It specifies the number of events written to file before it is flushed to HDFS.
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
    # It specifies the file size to trigger roll, in bytes. If it is equal to 0 then it means never roll based on file size.
    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
    #It specifies the number of events written to the file before it rolled. If it is equal to 0 then it means never roll based on the number of events.
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
    #It specifies the number of seconds to wait before rolling the current file. If it is equal to 0 then it means never roll based on the time interval.
    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 60
    TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000
    TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 10000
    TwitterAgent.channels.MemChannel.transactionCapacity = 1000

    Then, modifie the pom.xml (the version):


    Package-it with maven

    cd flume-sources
    mvn package

    It creates a target/flume-sources-1.0-SNAPSHOT.jar Copy it to your <YOUR_FLUME_HOME>/lib

    cp ./target/flume-sources-1.0-SNAPSHOT.jar ~/flume/lib

    I changed the CLASSPATH in the file I showed earlier talked to :


    Copy the conf/flume.conf we just write into <YOUR_FLUME_HOME>/conf

    Thirdly, verify if lib/ twitter4j-core.jar, media-support.jar et stream.jar are in version 3.0.3. If not go download them.

    An finally :

    cd $FLUME_HOME
    bin/flume-ng agent --conf ./conf/ -f ./conf/flume.conf -Dflume.root.logger=INFO,console -n TwitterAgent

    Halleluja :

    2020-12-18 02:48:38,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 100 docs
    2020-12-18 02:48:40,777 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 200 docs
    2020-12-18 02:48:42,017 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 300 docs
    2020-12-18 02:48:44,772 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 400 docs
    2020-12-18 02:48:46,779 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 500 docs
    2020-12-18 02:48:47,875 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 600 docs
    2020-12-18 02:48:49,852 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 700 docs
    2020-12-18 02:48:52,789 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 800 docs
    2020-12-18 02:48:54,791 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 900 docs
    2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(] Processed 1 000 docs
    2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(] Total docs indexed: 1 000, total skipped docs: 0
    2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(]     47 docs/second
    2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(] Run took 21 seconds and processed:
    2020-12-18 02:48:56,806 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(]     0,013 MB/sec sent to index
    2020-12-18 02:48:56,807 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(]     0,266 MB text sent to index
    2020-12-18 02:48:56,807 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(] There were 0 exceptions ignored: