Search code examples
hadoophdfsnamenode

Unable to start Hadoop (3.1.0) in Pseudomode on Ubuntu (16.04)


I am trying to follow the Getting Started guide from the Hadoop Apache website, in particular from the Pseudo distributed configuration, Getting started guide from Apache Hadoop 3.1.0

but I am unable to start the Hadoop Name- and Data Nodes. Can anyone help advise ? even if its things I can run to try to debug/investigate further.

At the end of the logs I see an Error message (not sure if its important or a red-herring).

    2018-04-18 14:15:40,003 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes

    2018-04-18 14:15:40,006 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks

    2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks            = 0

    2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks          = 0

    2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0

    2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  over-replicated blocks = 0

    2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written    = 0

    2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 11 msec

    2018-04-18 14:15:40,028 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

    2018-04-18 14:15:40,028 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting

    2018-04-18 14:15:40,029 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: NameNode RPC up at: localhost/127.0.0.1:9000

    2018-04-18 14:15:40,031 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state

    2018-04-18 14:15:40,031 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 thread(s)

    2018-04-18 14:15:40,033 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization completed in 2 milliseconds name space=1 storage space=0 storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 2018-04-18 14:15:40,037 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds

> 2018-04-18 14:15:40,232 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15:
> SIGTERM
> 
> 2018-04-18 14:15:40,236 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 1:
> SIGHUP
> 
> 2018-04-18 14:15:40,236 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at c0315/127.0.1.1

I have confirmed, that I can ssh localhost without a password prompt. I have also run the following steps from the above mentioned Apache Getting Started guide,

  1. $ bin/hdfs namenode -format
  2. $ sbin/start-dfs.sh

But I cant run step 3. to browse the location at http://localhost:9870/. When I run >jsp from the terminal prompt I just get returned,

14900 Jps

I was expecting a list of my nodes.

I will attach the full logs.

Can anyone help even with ways to debug this please ?

Java Version, $ java --version

java 9.0.4 
Java(TM) SE Runtime Environment (build 9.0.4+11) 
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

EDIT1 : I have repeated the steps with Java8 as well and get the same error message.

EDIT2: Following the comment suggestions below I have checked that I am definitely pointing at Java8 now and I have also commented out the localhost setting for 127.0.0.0 from the /etc/hosts file

commented-out-localhosts

java-version

hadoop-env

Ubuntu version,

$ lsb_release -a

No LSB modules are available.
Distributor ID: neon
Description: KDE neon User Edition 5.12
Release: 16.04
Codename: xenial

I have tried running the commands, bin/hdfs version

Hadoop 3.1.0 
Source code repository https://github.com/apache/hadoop -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d 
Compiled by centos on 2018-03-30T00:00Z 
Compiled with protoc 2.5.0 
From source with checksum 14182d20c972b3e2105580a1ad6990 
This command was run using /home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar

when I try bin/hdfs groups it doesnt return but gives me,

018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

when I try, $ bin/hdfs lsSnapshottableDir

lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

when I try, $ bin/hdfs classpath

/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/common/lib/:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/common/:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/hdfs/:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/:/home/steelydan.com/roycecoolige/Apps/hadoop-3.1.0/share/hadoop/yarn/*

core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

Solution

  • I finally find out the reason why pseudo distributed mode HDFS fails on KDE neon. It is caused by 40_kde_neon_allyourprocessarebelongtous.conf.

    ❯ cat /usr/lib/systemd/logind.conf.d/40_kde_neon_allyourprocessarebelongtous.conf
    # SPDX-License-Identifier: GPL-3.0-only OR LicenseRef-KDE-Accepted-GPL
    # SPDX-FileCopyrightText: 2016 Harald Sitter <[email protected]>
    
    [Login]
    KillUserProcesses=1
    

    Due to this file, nohup or disown mechanism doesn't work in KDE neon.

    If you comment it out, pseudo distributed mode HDFS will work.