Search code examples
aerospike

Broken pipe error on query from aerospike


i have namespace "test" and set "demo" when i run "select * from test.demo" in aql terminal, i got this error. What exactly causes broken pipe?

enter image description here

and i got a warn message in server log below.

server log

and my aerospike.conf is:

service {
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
    proto-fd-max 15000
}

logging {
    file /var/log/aerospike/aerospike.log {
            context any info
    }
}

network {
    service {
            address any
            port 3000
    }

    heartbeat {
            mode multicast
            multicast-group 239.1.99.222
            port 9918

            # To use unicast-mesh heartbeats, remove the 3 lines above, and see
            # aerospike_mesh.conf for alternative.

            interval 150
            timeout 10
    }

    fabric {
            port 3001
    }

    info {
            port 3003
    }
}

namespace test {
    replication-factor 2
    memory-size 4G
    default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine memory
}
namespace bar {
    replication-factor 2
    memory-size 4G
    default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine memory

    # To use file storage backing, comment out the line above and use the
    # following lines instead.
    #       storage-engine device {
    #               file /opt/aerospike/data/bar.dat
    #               filesize 16G
    #               data-in-memory true # Store data in memory in addition to file.
    #       }
}

somebody can figure out the reason?


Solution

  • I think you are getting a socket error when trying to send the scan result to a socket that has already timedout on the client side.

    Error: (-10) Socket read error: 11, [::1]:3000, 36006
    

    By default the aql timeout is set to 1000ms

    It could be bumped up to 100000ms using the -T command line option. (or using set timeout within the aql interactive mode)

    aql -T 100000
    

    -T, --timeout <ms> Set the timeout (ms) for commands. Default: 1000 This option is equivalent to setting TotalTimeout on other clients.

    Setting the timeout higher should help, but doesn't answer why a basic scan would take so long.

    Here is an example with setting different client timeouts, this shows the clients timing out prior to the scan result being received. In the logs you would see the TCP send error for scan.

    WARNING (proto): (proto.c:693) send error - fd 32 Broken pipe
    

    Details from aql console:

    aql> set timeout 10
    TIMEOUT = 10
    aql> select * from test.demo
    Error: (-10) Socket read error: 11, 127.0.0.1:3000, 58496
    
    aql> select * from test.demo
    Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58498
    
    
    aql> set timeout 100
    TIMEOUT = 100
    aql> select * from test.demo
    Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58492
    
    aql> set timeout 1000
    TIMEOUT = 1000
    aql> select * from test.demo
    +-----+-------+
    | foo | bar   |
    +-----+-------+
    | 123 | "abc" |
    +-----+-------+
    1 row in set (0.341 secs)
    

    Its still a mystery why your aql client would timeout for returning 1 record, if default timeout was kept at 1000ms. Did you by any chance modify the timeout. Or have a huge number of records in the test namespace with null sets.