Search code examples
pythonapache-stormstormcrawler

How to integrate a python bolt to a topology built using Storm Crawler SDK


I was trying to integrate a bolt created in python within the topology built using Storm-Crawler_SDK-1.7 and Apache-Storm-1.1.0 components. The topology execution cannot find the executable python program and searches it in a completely different temporary location. I constantly get this error when I try to execute the topology:

27238 [Thread-20-classify-executor[2 2]] ERROR o.a.s.util - Async loop died!
java.lang.RuntimeException: Error when launching multilang subprocess

    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:94) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.task.ShellBolt.prepare(ShellBolt.java:150) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.daemon.executor$fn__5030$fn__5043.invoke(executor.clj:793) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.1.1.jar:1.1.1]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
Caused by: java.io.IOException: Cannot run program "python" (in directory "C:\Users\akumar\AppData\Local\Temp\24ed7755-e7c0-42d4-a17f-939082feb1a8\supervisor\stormdist\crawler-1-1528614625\resources"): CreateProcess error=267, The directory name is inval*id
    at java.lang.ProcessBuilder.start(Unknown Source) ~[?:1.8.0_171]
    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:87) ~[storm-core-1.1.1.jar:1.1.1]
    ... 5 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
    at java.lang.ProcessImpl.create(Native Method) ~[?:1.8.0_171]
    at java.lang.ProcessImpl.<init>(Unknown Source) ~[?:1.8.0_171]
    at java.lang.ProcessImpl.start(Unknown Source) ~[?:1.8.0_171]
    at java.lang.ProcessBuilder.start(Unknown Source) ~[?:1.8.0_171]
    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:87) ~[storm-core-1.1.1.jar:1.1.1]
    ... 5 more
27246 [Thread-20-classify-executor[2 2]] ERROR o.a.s.d.executor - 
java.lang.RuntimeException: Error when launching multilang subprocess

    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:94) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.task.ShellBolt.prepare(ShellBolt.java:150) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.daemon.executor$fn__5030$fn__5043.invoke(executor.clj:793) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.1.1.jar:1.1.1]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
Caused by: java.io.IOException: Cannot run program "python" (in directory "C:\Users\akumar\AppData\Local\Temp\24ed7755-e7c0-42d4-a17f-939082feb1a8\supervisor\stormdist\crawler-1-1528614625\resources"): CreateProcess error=267, The directory name is invalid
    at java.lang.ProcessBuilder.start(Unknown Source) ~[?:1.8.0_171]
    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:87) ~[storm-core-1.1.1.jar:1.1.1]
    ... 5 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
    at java.lang.ProcessImpl.create(Native Method) ~[?:1.8.0_171]
    at java.lang.ProcessImpl.<init>(Unknown Source) ~[?:1.8.0_171]
    at java.lang.ProcessImpl.start(Unknown Source) ~[?:1.8.0_171]
    at java.lang.ProcessBuilder.start(Unknown Source) ~[?:1.8.0_171]
    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:87) ~[storm-core-1.1.1.jar:1.1.1]
    ... 5 more
27624 [Thread-30-sitemap-executor[9 9]] INFO  c.d.s.f.URLFilters - Loaded instance of class com.digitalpebble.stormcrawler.filtering.regex.RegexURLFilter
27626 [Thread-30-sitemap-executor[9 9]] INFO  o.a.s.d.executor - Prepared bolt sitemap:(9)
27627 [Thread-40-fetch-executor[4 4]] INFO  c.d.s.f.URLFilters - Loaded instance of class com.digitalpebble.stormcrawler.filtering.regex.RegexURLFilter
27628 [Thread-40-fetch-executor[4 4]] INFO  c.d.s.b.FetcherBolt - [Fetcher #-1] : starting at 2018-06-10 12:40:37
27638 [Thread-24-feed-executor[3 3]] INFO  c.d.s.f.URLFilters - Loaded instance of class com.digitalpebble.stormcrawler.filtering.regex.RegexURLFilter
27639 [Thread-24-feed-executor[3 3]] INFO  o.a.s.d.executor - Prepared bolt feed:(3)
27661 [Thread-20-classify-executor[2 2]] ERROR o.a.s.util - Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
    at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.1.1.jar:1.1.1]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
    at org.apache.storm.daemon.worker$fn__5628$fn__5629.invoke(worker.clj:759) [storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.daemon.executor$mk_executor_data$fn__4848$fn__4849.invoke(executor.clj:276) [storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.1.1.jar:1.1.1]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]

The topology is working fine when the python bolt is not included within the topology.

The python bolt is also working as expected when I don't used Storm Crawler SDK components within a topology.

Can anyone help?


Solution

  • The problem is resolved when I changed the Maven's pom.xml configuration for apache-storm-core dependency option.

    The tag was set to provided. I changed it to compile and that resolved the issue for running in local mode.