I'm using Hortonworks HDP distro (2.4) on Ubuntu 14
Downloaded kite-dataset
Running this command:
./kite-dataset -v csv-import --delimiter '|' ml-100k/u.item movies
Getting this error:
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
IO error
org.kitesdk.data.DatasetIOException: Cannot add jar path to distributed cache: /usr/hdp/2.4.2.0-258/hive/lib
at org.kitesdk.tools.TaskUtil$ConfigBuilder.addJarPathForClass(TaskUtil.java:129)
at org.kitesdk.tools.TransformTask.run(TransformTask.java:165)
at org.kitesdk.cli.commands.CSVImportCommand.run(CSVImportCommand.java:186)
at org.kitesdk.cli.Main.run(Main.java:184)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.kitesdk.cli.Main.main(Main.java:266)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Jar file: /usr/hdp/2.4.2.0-258/hive/lib/ojdbc6.jar does not exist.
at org.apache.crunch.util.DistCache.addJarToDistributedCache(DistCache.java:115)
at org.apache.crunch.util.DistCache.addJarDirToDistributedCache(DistCache.java:208)
at org.apache.crunch.util.DistCache.addJarDirToDistributedCache(DistCache.java:229)
at org.kitesdk.tools.TaskUtil$ConfigBuilder.addJarPathForClass(TaskUtil.java:127)
... 11 more
What can I do to overcome this issue?
This seems to be the relevant part of the error message:
Caused by: java.io.IOException: Jar file: /usr/hdp/2.4.2.0-258/hive/lib/ojdbc6.jar does not exist
The missing jar seems to be an Oracle JDBC driver.