Search code examples
hadoopapache-sparkhivemetastorehivecontext

Spark: Not able to read data from hive tables


I have created a maven project as pom.xml

<spark.version>1.3.0</spark.version>
<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>${spark.version}</version>
        <exclusions>
            <exclusion>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <!-- <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.6</version>
    </dependency> -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>

</dependencies>

My class which is reading data from hive table :

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.DataFrame

class SparkHive {
  def createTable = {
    val conf = new SparkConf().setMaster("local").setAppName("My First spark app")
    val sparkCtxt = new SparkContext(conf)
    val hiveContext = new HiveContext(sparkCtxt)
    hiveContext.setConf("hive.metastore.uris", "thrift://127.0.0.1:9083")
    val table = hiveContext.sql("select * from test")
    table.show()
    val gpData = table.groupBy("col1")
    println(gpData.max("col2").show())
  }
}

I am using spark to read data from a table present in hive metatore but facing a very strange issues.

I have two question as described below:

Question 1. If I use <spark.version>1.3.0</spark.version> spark is able to find hive table and it is able to print data on console with the help of this line

val table = hiveContext.sql("select * from test")
table.show()

but if I do filter or group by as shown in the example spark can not find col1 and throws exception as below

Exception in thread "main" java.util.NoSuchElementException: key not found: col1#0

so the question is why is that if data frame is able to find that table then why it is not letting me do group by on columns and how to solve this issue??

Question 2. If I use <spark.version>1.6.0</spark.version> then spark is not able even find the table present in hive metastore so now why is this behavior ???

ENVIRONMENT: CLOUDERA QUICKSTART VM 5.8.0


Solution

  • They only trick was to put the hive-site.xml in classpath.