Search code examples
javahadoopkite-sdk

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/RecordReader


I am trying to convert my Json file to Parquet format.

Following is my pom file.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mypackage</groupId>
    <artifactId>JSONToParquet</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <repositories>
        <repository>
            <id>wso2</id>
            <url>http://dist.wso2.org/maven2/</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-data-core</artifactId>
            <version>1.1.0</version>
        </dependency>

        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-morphlines-all</artifactId>
            <version>1.0.0</version> <!-- or whatever the latest version is -->
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/ua_parser/ua-parser -->
        <dependency>
            <groupId>ua_parser</groupId>
            <artifactId>ua-parser</artifactId>
            <version>1.3.0</version>
            <type>pom</type>
        </dependency>

    </dependencies>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>


</project>

Following is the code for conversion :

Schema jsonSchema = JsonUtil.inferSchema(inputstream, "Movie", 10);
try (JSONFileReader<Movie> reader = new JSONFileReader<>(
        inputstream, jsonSchema, Movie.class)) {

    reader.initialize();

    ParquetWriter parquetWriter
            = new AvroParquetWriter(outputPath, jsonSchema, compressionCodecName, ParquetWriter.DEFAULT_BLOCK_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE);

    for (Movie record : reader) {
        parquetWriter.write(record);
    }

In the above code Movie is my POJO class.

When I run the program I am facing the following exception :

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/RecordReader
    at com.mypackage.jsontoparquet.JsonToParquet.main(JsonToParquet.java:34)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.RecordReader
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 1 more

I am using JDK : 8.

I don't have any background of hadoop, so I am unable to understand it's root cause.

What is the issue ?


Solution

  • Based on Kite-SDK Documentation, JSONFileReader,ParquetWriter and AvroParquetWriter use Hadoop libraries to work. It is needed to add Hadoop dependencies in your pom. You need at least below dependencies. Add them in your pom.xml:

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-mapreduce-client-core</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
        <version>2.6.0</version>
    </dependency>