I followed the http://stormcrawler.net/getting-started/ guide to generate the jar file for topology.When i run the topology by using storm command in readme
file , i get following error in execution of FetcherBolt. I have storm Storm 1.1.0.2.6.4.0-91 installed in a Hortonworks cluster. I get same exception regardless of -local or distributed mode.
I got following exception
java.lang.NoSuchMethodError: org.apache.commons.logging.impl.LogFactoryImpl.handleThrowable(Ljava/lang/Throwable;)V at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:568) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) at org.apache.http.conn.ssl.AbstractVerifier.(AbstractVerifier.java:61) at org.apache.http.conn.ssl.AllowAllHostnameVerifier.(AllowAllHostnameVerifier.java:44) at org.apache.http.conn.ssl.AllowAllHostnameVerifier.(AllowAllHostnameVerifier.java:46) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.(SSLConnectionSocketFactory.java:146) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.getDefaultRegistry(PoolingHttpClientConnectionManager.java:115) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.(PoolingHttpClientConnectionManager.java:122) at com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol.(HttpProtocol.java:76) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at com.digitalpebble.stormcrawler.protocol.ProtocolFactory.(ProtocolFactory.java:60) at com.digitalpebble.stormcrawler.bolt.FetcherBolt.prepare(FetcherBolt.java:738) at org.apache.storm.daemon.executor$fn__9635$fn__9648.invoke(executor.clj:794) at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:748)
This is probably due to a conflict between the version of commons-logging inherited from the httpclient library and one which is put on the classpath by the Hortonworks version of Apache Storm.
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.3:compile
[INFO] | +- org.apache.httpcomponents:httpcore:jar:4.4.6:compile
[INFO] | +- commons-logging:commons-logging:jar:1.2:compile
[INFO] | \- commons-codec:commons-codec:jar:1.9:compile
You could try using a different protocol implementation by setting
http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"
https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"
in the crawler-conf.yaml file. Note that this does not guarantee that the call to commons-logging won't happen somewhere else. Ideally, you'd want to resolve the dependency problem e.g. by making sure that Hortonworks uses the same version as the one needed by StormCrawler. We support only the Apache distribution of Storm.