Search code examples
hadoophortonworks-data-platformkite-dataset

how to avoid IO error while using kite-dataset to import data?


I'm using Hortonworks HDP distro (2.4) on Ubuntu 14

Downloaded kite-dataset

Running this command:

./kite-dataset -v csv-import --delimiter '|' ml-100k/u.item movies

Getting this error:

WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
IO error
org.kitesdk.data.DatasetIOException: Cannot add jar path to distributed cache: /usr/hdp/2.4.2.0-258/hive/lib
    at org.kitesdk.tools.TaskUtil$ConfigBuilder.addJarPathForClass(TaskUtil.java:129)
    at org.kitesdk.tools.TransformTask.run(TransformTask.java:165)
    at org.kitesdk.cli.commands.CSVImportCommand.run(CSVImportCommand.java:186)
    at org.kitesdk.cli.Main.run(Main.java:184)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.kitesdk.cli.Main.main(Main.java:266)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Jar file: /usr/hdp/2.4.2.0-258/hive/lib/ojdbc6.jar does not exist.
    at org.apache.crunch.util.DistCache.addJarToDistributedCache(DistCache.java:115)
    at org.apache.crunch.util.DistCache.addJarDirToDistributedCache(DistCache.java:208)
    at org.apache.crunch.util.DistCache.addJarDirToDistributedCache(DistCache.java:229)
    at org.kitesdk.tools.TaskUtil$ConfigBuilder.addJarPathForClass(TaskUtil.java:127)
    ... 11 more

What can I do to overcome this issue?


Solution

  • This seems to be the relevant part of the error message:

    Caused by: java.io.IOException: Jar file: /usr/hdp/2.4.2.0-258/hive/lib/ojdbc6.jar does not exist

    The missing jar seems to be an Oracle JDBC driver.