Search code examples
kubernetesetcd

how to handle etcdserver: unhealthy cluster


When I add node in master of etcd cluster using this command:

curl http://127.0.0.1:2379/v3beta/members \
-XPOST -H "Content-Type: application/json" \
-d '{"peerURLs": ["http://172.19.104.230:2380"]}'

It shows {"error":"etcdserver: unhealthy cluster","code":14}.

And I check the cluster status:

[root@iZuf63refzweg1d9dh94t8Z ~]# etcdctl member list
55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379
696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379

It is fine. What should I do to make it works?


Solution

  • According to etcd source code, it returns ErrUnhealthy error code if longestConnected method failed.

    // longestConnected chooses the member with longest active-since-time.
    // It returns false, if nothing is active.
    func longestConnected(tp rafthttp.Transporter, membs []types.ID) (types.ID, bool) {
        var longest types.ID
        var oldest time.Time
        for _, id := range membs {
            tm := tp.ActiveSince(id)
            if tm.IsZero() { // inactive
                continue
            }
    
            if oldest.IsZero() { // first longest candidate
                oldest = tm
                longest = id
            }
    
            if tm.Before(oldest) {
                oldest = tm
                longest = id
            }
        }
        if uint64(longest) == 0 {
            return longest, false
        }
        return longest, true
    }
    

    So, ectd can't find appropriate member to connect.

    Cluster's method VotingMemberIDs returns list of voting members:

    transferee, ok := longestConnected(s.r.transport, s.cluster.VotingMemberIDs())
    if !ok {
        return ErrUnhealthy
    }
    
    // VotingMemberIDs returns the ID of voting members in cluster.
    func (c *RaftCluster) VotingMemberIDs() []types.ID {
        c.Lock()
        defer c.Unlock()
        var ids []types.ID
        for _, m := range c.members {
            if !m.IsLearner {
                ids = append(ids, m.ID)
            }
        }
        sort.Sort(types.IDSlice(ids))
        return ids
    }
    

    As we can see from you report, there are members in your cluster.

    $ etcdctl member list
    > 55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379
    > 696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379
    

    So we should check members - are they voiting members, not learners, see etcd docs | Learner

    Raft learner

    // RaftAttributes represents the raft related attributes of an etcd member.
    type RaftAttributes struct {
        // PeerURLs is the list of peers in the raft cluster.
        // TODO(philips): ensure these are URLs
        PeerURLs []string `json:"peerURLs"`
        // IsLearner indicates if the member is raft learner.
        IsLearner bool `json:"isLearner,omitempty"`
    }
    

    So, try to increase members count to provide a quorum etcd quorum

    To force creating members try this ETCD_FORCE_NEW_CLUSTER=“true"

    Quorum

    See also this post: Understanding cluster and pool quorum