Search code examples
apache-sparkhadooppysparkhdfs

IllegalArgumentException: java.net.UnknownHostException: NNode


I am able to connect my hive table which is in a different vm using DBeaver.

When i try to connect my hive through pyspark, I was able to look at all my tables within a schema but when I try to query a table, I get an error saying

IllegalArgumentException                  Traceback (most recent call last)
Cell In[7], line 1
----> 1 spark.sql("SELECT * FROM u1").show()

File C:\ProgramData\anaconda3\lib\site-packages\pyspark\sql\dataframe.py:959, in DataFrame.show(self, n, truncate, vertical)
    953     raise PySparkTypeError(
    954         error_class="NOT_BOOL",
    955         message_parameters={"arg_name": "vertical", "arg_type": type(vertical).__name__},
    956     )
    958 if isinstance(truncate, bool) and truncate:
--> 959     print(self._jdf.showString(n, 20, vertical))
    960 else:
    961     try:

File C:\ProgramData\anaconda3\lib\site-packages\py4j\java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File C:\ProgramData\anaconda3\lib\site-packages\pyspark\errors\exceptions\captured.py:185, in capture_sql_exception.<locals>.deco(*a, **kw)
    181 converted = convert_exception(e.java_exception)
    182 if not isinstance(converted, UnknownException):
    183     # Hide where the exception came from that shows a non-Pythonic
    184     # JVM exception message.
--> 185     raise converted from None
    186 else:
    187     raise

IllegalArgumentException: java.net.UnknownHostException: NNode

`

I have given the core-site.xml and hdfs-site.xml below for references.

1.core-site.xml


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://{NAME_NODE}:9000</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
     <property>
                <name>hadoop.proxyuser.hadoop.hosts</name>
                <value>*</value>
        </property>
</configuration>

2.hdfs-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration>
        
    <property>  
        <name>dfs.namenode.name.dir </name>             
        <value>file:///{NAME_NODE}/hdfs-data/name</value>           
    </property>     
    <property>              
        <name>dfs.namenode.acls.enabled</name>              
        <value>true</value>         
    </property>
    <property>          
        <name>dfs.namenode.http-address</name>              
        <value>webhdfs://{NAME_NODE}:9868</value>           
    </property>         
    <property>              
        <name>dfs.namenode.secondary.http-address </name>               
        <value>webhdfs://{SECONDARY_NAME_NODE}:9868</value>         
    </property>     
    <property>              
        <name>dfs.replication</name>                
        <value>2</value>        
    </property>     
    <property>              
        <name>dfs.blocksize</name>              
        <value>134217728</value>            
    </property>     
    <property>              
        <name>dfs.datanode.data.dir</name>              
        <value>file:///{NAME_NODE}/hdfs-data/data</value>           
    </property>    
    <property>                
        <name>dfs.permissions.enabled</name>                
        <value>true</value>           
    </property>       
    <property>                
        <name>dfs.permissions</name>                
        <value>true</value>           
    </property>        
    <property>                
        <name>dfs.namenode.inode.attributes.provider.class</name>               
        <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>            
    </property>       
    <property>                
        <name>dfs.permissions.ContentSummary.subAccess</name>                
        <value>true</value>            
    </property>    
</configuration>

These files were placed inside C:\ProgramData\anaconda3\Lib\site-packages\pyspark\conf

Please try and give me a solution


Solution

  • I was able to sort this issue on my own. The issue was sorted by assigning the NAME_NODE IP as NNode in the host files.