Search code examples
gotcpconnectionclient-serverconnection-timeout

Unusually High Amount of TCP Connection Timeout Errors


I am using a Go TCP Client to connect to our Go TCP Server.

I am able to connect to the Server and run commands properly, but every so often there will be an unusually high amount of consecutive TCP connection errors reported by my TCP Client when trying to either connect to our TCP Server or sending a message once connected:

dial tcp kubernetes_node_ip:exposed_kubernetes_port:
connectex: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.

read tcp unfamiliar_ip:unfamiliar_port->kubernetes_node_ip:exposed_kubernetes_port
wsarecv: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.

I say "unusually high" because I assume that the number of times these errors occur should be very minimal (about 5 or less within the hour). Note that I am not dismissing the possibility of this being caused by connection instabilities, as I have also noticed that it is possible to run several commands in rapid succession without any errors.

However, I am still going to post my code in case I am doing something wrong.

Below is the code that my TCP Client uses to connect to our server:

serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
if err != nil {     
    fmt.Println(err)
    return
}

// Never stop asking for commands from the user.
for {
    // Connect to the server.
    serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
    if err != nil {         
        fmt.Println(err)
        continue
    }

    defer serverConnection.Close()

    // Added to prevent connection timeout errors, but doesn't seem to be helping
    // because said errors happen within just 1 or 2 minutes.
    err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
    if err != nil {         
        fmt.Println(err)
        continue
    }

    // Ask for a command from the user and convert to JSON bytes...

    // Send message to server.
    _, err = serverConnection.Write(clientMsgBytes)
    if err != nil {
        err = merry.Wrap(err)
        fmt.Println(merry.Details(err))
        continue
    }

    err = serverConnection.CloseWrite()
    if err != nil {
        err = merry.Wrap(err)
        fmt.Println(merry.Details(err))
        continue
    }

    // Wait for a response from the server and print...
}

Below is the code that our TCP Server uses to accept client requests:

// We only supply the port so the IP can be dynamically assigned:
serverAddress, err := net.ResolveTCPAddr("tcp", ":"+server_port)
if err != nil {     
    return err
}

tcpListener, err := net.ListenTCP("tcp", serverAddress)
if err != nil {     
    return err
}

defer tcpListener.Close()

// Never stop listening for client requests.
for {
    clientConnection, err := tcpListener.AcceptTCP()
    if err != nil {         
        fmt.Println(err)
        continue
    }

    go func() {
        // Add client connection to Job Queue.
        // Note that `clientConnections` is a buffered channel with a size of 1500.
        // Since I am the only user connecting to our server right now, I do not think
        // this is a channel blocking issue.
        clientConnections <- clientConnection
    }()
}

Below is the code that our TCP Server uses to process client requests:

defer clientConnection.Close()

// Added to prevent connection timeout errors, but doesn't seem to be helping
// because said errors happen within just 1 or 2 minutes.
err := clientConnection.SetDeadline(time.Now().Add(10 * time.Minute))
if err != nil {     
    return err
}

// Read full TCP message.
// Does not stop until an EOF is reported by `CloseWrite()`
clientMsgBytes, err := ioutil.ReadAll(clientConnection)
if err != nil {
    err = merry.Wrap(err)
    return nil, err
}

// Process the message bytes...

My questions are:

  1. Am I doing something wrong in the above code, or is the above decent enough for basic TCP Client-Server operations?

  2. Is it okay that both the TCP Client and TCP Server have code that defers closing their one connection?

  3. I seem to recall that calling defer inside a loop does nothing. How do I properly close Client connections before starting new ones?

Some extra information:

  • Said errors are not logged by the TCP Server, so aside from connection instabilities, this might also be a Kubernetes/Docker-related issue.

Solution

  • It seems this piece of code does not act as you think it does. The defer statement on the connection close will only happen when the function returns, not when an iteration ends. So as far as I can see here, you are creating a lot of connections on the client side, it could be the problem.

    serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
    if err != nil {     
        fmt.Println(err)
        return
    }
    
    // Never stop asking for commands from the user.
    for {
        // Connect to the server.
        serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
        if err != nil {         
            fmt.Println(err)
            continue
        }
    
        defer serverConnection.Close()
    
        // Added to prevent connection timeout errors, but doesn't seem to be helping
        // because said errors happen within just 1 or 2 minutes.
        err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
        if err != nil {         
            fmt.Println(err)
            continue
        }
    
        // Ask for a command from the user and send to the server...
    
        // Wait for a response from the server and print...
    }
    

    I suggest to write it this way:

    func start() {
        serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
        if err != nil {     
            fmt.Println(err)
            return
        }
        for {
            if err := listen(serverAddress); err != nil {
                fmt.Println(err)
            }
        }
    }
    
    func listen(serverAddress string) error {
         // Connect to the server.
         serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
         if err != nil {         
             fmt.Println(err)
             continue
         }
    
        defer serverConnection.Close()
    
        // Never stop asking for commands from the user.
        for {
            // Added to prevent connection timeout errors, but doesn't seem to be helping
            // because said errors happen within just 1 or 2 minutes.
            err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
            if err != nil {         
               fmt.Println(err)
               return err
            }
    
            // Ask for a command from the user and send to the server...
    
            // Wait for a response from the server and print...
        }
    }
    

    Also, you should keep a single connection open, or a pool of connections, instead of opening and closing the connection right away. Then when you send a message you get a connection from the pool (or the single connection), and you write the message and wait for the response, then you release the connection to the pool.

    Something like that:

    res, err := c.Send([]byte(`my message`))
    if err != nil {
        // handle err
    }
    
    // the implementation of send
    func (c *Client) Send(msg []byte) ([]byte, error) {
        conn, err := c.pool.Get() // returns a connection from the pool or starts a new one
        if err != nil {
            return nil, err
        }
        // send your message and wait for response
        // ...
        return response, nil
    }