Search code examples
gonetwork-programmingdnsudp

Wrapping net.Conn results in hanging read


I am a Go novice trying to debug some networking issues in a production environment, on an app that is experiencing some issues which I suspect are related to DNS resolution. Long story short, after successfully connecting to and interacting with a redis cluster, the connection fails and the app starts to log a lot of dial: tcp :0: connection refused errors. The :0: in this message is why I am suspecting dns, because from what I can gather, this indicates that it's trying to connect to a zero-valued remote address and I really don't know why that would happen unless something went wrong with the hostname lookup.

Anyways, if anyone has some insight into that particular problem I'd love to hear it but the primary question here is related to a problem I've stumbled into while trying to debug this:

In an effort to see what is actually going on with the hostname resolution, I passed a custom resolver into the dialer which returns a wrapped connection from its own dialer. The connection wrapper essentially just logs bytes read and written on the native connection and nothing else. Simple reproducible example here:

package main

import (
    "context"
    "fmt"
    "net"
    "time"
)

type ConnWrapper struct {
    net.Conn
}

func (c *ConnWrapper) Write(bytes []byte) (int, error) {
    n, err := c.Conn.Write(bytes)
    if err != nil {
        fmt.Printf("Failed to write bytes %s: %v\n", string(bytes), err)
    } else {
        fmt.Printf("Successfully wrote %d of bytes %x\n", n, bytes[:n])
    }
    return n, err
}

func (c *ConnWrapper) Read(bytes []byte) (int, error) {
    n, err := c.Conn.Read(bytes)
    if err != nil {
        fmt.Printf("Failed to read bytes: %v\n", err)
    } else {
        fmt.Printf("Successfully read %d of bytes %x\n", n, bytes[:n])
    }
    return n, err
}

func (c *ConnWrapper) SetDeadline(t time.Time) error {
    fmt.Printf("Setting deadline %v %s %s\n", t, c.LocalAddr(), c.RemoteAddr())
    return c.Conn.SetDeadline(t)
}

func (c *ConnWrapper) SetReadDeadline(t time.Time) error {
    fmt.Printf("Setting read deadline %v %s %s\n", t, c.LocalAddr(), c.RemoteAddr())
    return c.Conn.SetReadDeadline(t)
}

func (c *ConnWrapper) SetWriteDeadline(t time.Time) error {
    fmt.Printf("Setting write deadline %v %s %s\n", t, c.LocalAddr(), c.RemoteAddr())
    return c.Conn.SetWriteDeadline(t)
}

func main() {
    var d *net.Dialer
    d = &net.Dialer{
        Timeout: time.Duration(30) * time.Minute,
        Resolver: &net.Resolver{
            PreferGo: true,
            Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
                fmt.Printf("Redis dialing (from resolver) %s %s\n", network, address)
                conn, err := d.DialContext(ctx, network, address)

                if err != nil {
                    fmt.Printf("Redis resolver failed %v\n", err)
                }

                // When I return c2, the reads time out
                c2 := &ConnWrapper{conn}
                return c2, err

                // When I return conn, everything is fine
                //return conn, err
            },
        },
    }
    conn, err := d.DialContext(context.Background(), "tcp", "redis-node-0:6379")
    if err != nil {
        fmt.Printf("Redis dial failed %v\n", err)
    } else {
        fmt.Printf("Successfully dialed %s\n", conn.RemoteAddr())
    }
}

What I don't understand is that when I return the native connection, everything works as expected. When I return the wrapped connection, the connection hangs/timeouts on resolving the DNS name (specifically in the conn.Read call). I'm at a loss trying to understand why this happens, but I also have very little practical experience with Go so I'm wondering if maybe there's a simple explanation I'm just not seeing. I've tried asking Google and ChatGPT to no avail, so hoping someone in the community has some ideas I can run with.

Executing the program above yields the following output on my system (using golang 1.21)

Redis dialing (from resolver) udp 1.1.1.1:53
Redis dialing (from resolver) udp 1.1.1.1:53
Setting deadline 2023-10-15 10:43:44.031475 -0400 AST m=+5.001741085 192.168.68.61:62862 1.1.1.1:53
Setting deadline 2023-10-15 10:43:44.031494 -0400 AST m=+5.001759751 192.168.68.61:57358 1.1.1.1:53
Successfully wrote 49 of bytes 002f53c8010000010000000000010c72656469732d6e6f64652d30056c6f63616c00001c000100002904d0000000000000
Successfully wrote 49 of bytes 002f29c8010000010000000000010c72656469732d6e6f64652d30056c6f63616c000001000100002904d0000000000000
Successfully read 2 of bytes 002f
Successfully read 2 of bytes 002f
Failed to read bytes: read udp 192.168.68.61:57358->1.1.1.1:53: i/o timeout
Redis dialing (from resolver) udp 1.0.0.1:53
Failed to read bytes: read udp 192.168.68.61:62862->1.1.1.1:53: i/o timeout
Redis dialing (from resolver) udp 1.0.0.1:53
Setting deadline 2023-10-15 10:43:49.033465 -0400 AST m=+10.003676876 192.168.68.61:64641 1.0.0.1:53
Setting deadline 2023-10-15 10:43:49.033817 -0400 AST m=+10.004027543 192.168.68.61:59264 1.0.0.1:53
Successfully wrote 49 of bytes 002f6612010000010000000000010c72656469732d6e6f64652d30056c6f63616c00001c000100002904d0000000000000
Successfully wrote 49 of bytes 002fd26c010000010000000000010c72656469732d6e6f64652d30056c6f63616c000001000100002904d0000000000000
Successfully read 2 of bytes 002f
Failed to read bytes: read udp 192.168.68.61:59264->1.0.0.1:53: i/o timeout
Redis dialing (from resolver) udp 1.1.1.1:53
Setting deadline 2023-10-15 10:43:54.036073 -0400 AST m=+15.006228210 192.168.68.61:56757 1.1.1.1:53
Failed to read bytes: read udp 192.168.68.61:64641->1.0.0.1:53: i/o timeout
Redis dialing (from resolver) udp 1.1.1.1:53
Successfully wrote 49 of bytes 002f4f6d010000010000000000010c72656469732d6e6f64652d30056c6f63616c00001c000100002904d0000000000000
Setting deadline 2023-10-15 10:43:54.036487 -0400 AST m=+15.006642126 192.168.68.61:58296 1.1.1.1:53
Successfully wrote 49 of bytes 002f404c010000010000000000010c72656469732d6e6f64652d30056c6f63616c000001000100002904d0000000000000
Successfully read 2 of bytes 002f
Successfully read 2 of bytes 002f
Failed to read bytes: read udp 192.168.68.61:58296->1.1.1.1:53: i/o timeout
Failed to read bytes: read udp 192.168.68.61:56757->1.1.1.1:53: i/o timeout

When I update the program to simply return the built-in Conn instance, I get the following (Note that I'm running this in an environment where the redis host is not available, so 'no such host' is actually expected behavior here. The point is that the dns lookup actually finishes.)

Redis dialing (from resolver) udp 1.1.1.1:53
Redis dialing (from resolver) udp 1.1.1.1:53
Redis dialing (from resolver) udp 1.1.1.1:53
Redis dialing (from resolver) udp 1.1.1.1:53
Redis dial failed dial tcp: lookup redis-node-0 on 1.1.1.1:53: no such host

Solution

  • The resolver distinguishes between net.PacketConn and net.Conn (documentation). If dial returns a packet connection, then return a packet connection wrapper.

    Here's the wrapper:

    type PacketConnWrapper struct {
        ConnWrapper
        pc net.PacketConn
    }
    
    func (c PacketConnWrapper) ReadFrom(bytes []byte) (int, net.Addr, error) {
        n, addr, err := c.pc.ReadFrom(bytes)
        if err != nil {
            fmt.Printf("Failed to read from bytes: %v\n", err)
        } else {
            fmt.Printf("Successfully read %d bytes %x from %s\n", n, bytes[:n], addr)
        }
        return n, addr, err
    }
    
    func (c PacketConnWrapper) WriteTo(bytes []byte, addr net.Addr) (int, error) {
        n, err := c.pc.WriteTo(bytes, addr)
        if err != nil {
            fmt.Printf("Failed to write bytes %x to %v: %s\n", bytes, addr, err)
        } else {
            fmt.Printf("Successfully write bytes %x to %s\n", bytes, addr)
        }
        return n, err
    }
    

    Use the packet wrapper in the dial function like this:

    d = &net.Dialer{
        Timeout: time.Duration(30) * time.Minute,
        Resolver: &net.Resolver{
            PreferGo: true,
            Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
                fmt.Printf("Redis dialing (from resolver) %s %s\n", network, address)
                conn, err := d.DialContext(ctx, network, address)
    
                if err != nil {
                    fmt.Printf("Redis resolver failed %v\n", err)
                    return conn, err
                }
    
                if pc, ok := conn.(net.PacketConn); ok {
                    return PacketConnWrapper{ConnWrapper{conn}, pc}, nil
                }
    
                return ConnWrapper{conn}, nil
            },
        },
     }
    

    https://go.dev/play/p/1KOr95FbDKD