Search code examples
sessiontclexpecttelnetspawn

Terminating spawn sessions in expect


I'm trying to address an issue with an Expect script that logs into a very large number of devices (thousands). The script is about 1500 lines and fairly involved; its job is to audit managed equipment on a network with many thousands of nodes. As a result, it logs into the devices via telnet, runs commands to check on the health of the equipment, logs this information to a file, and then logs out to proceed to the next device.

This is where I'm running into my problem; every expect in my script includes a timeout and an eof like so:

timeout {
    lappend logmsg "$rtrname timed out while <description of expect statement>"
    logmessage
    close
    wait
    set session 0
    continue
}
eof {
    lappend logmsg "$rtrname disconnected while <description of expect statement>"
    logmessage
    set session 0
    continue
}

My final expect closes each spawn session manually:

-re "OK.*#" {
    close
    send_user "Closing session... "
    wait
    set session 0
    send_user "closed.\n\n"
    continue
}

The continues bring the script back to the while loop that initiates the next spawn session, assuming session = 0.

The set session 0 tracks when a spawn session closes either manually by the timeout or via EOF before a new spawn session is opened, and everything seems to indicate that the spawn sessions are being closed, yet after a thousand or so spawned sessions, I get the following error:

spawn telnet <IP removed>
too many programs spawned?  could not create pipe: too many open files

Now, I'm a network engineer, not a UNIX admin or professional programmer, so can someone help steer me towards my mistake? Am I closing telnet spawn sessions but not properly closing a channel? I wrote a second, test script, that literally just connects to devices one by one and disconnects immediately after a connection is formed. It doesn't log in or run any commands as my main script does, and it works flawlessly through thousands of connections. That script is below:

#!/usr/bin/expect -f

#SPAWN TELNET LIMIT TEST

set ifile [open iad.list]
set rtrname ""
set sessions 0

while {[gets $ifile rtrname] != -1} {
set timeout 2
spawn telnet $rtrname
incr sessions
send_user "Session# $sessions\n"
expect  {
    "Connected" {
                close
                wait
                continue
                }
    timeout     {
                close
                wait
                continue
                }
    eof         {
                continue
                }
}

In my main script I'm logging every single connection and why they may EOF or timeout (via the logmessage process which writes a specific reason to a file), and even when I see nothing but successful spawned connections and closed connections, I get the same problem with my main script but not the test script.

I've been doing some reading on killing process IDs, but as I understand it, close should be killing the process ID of the current spawn session, and wait should be halting the script until the process is dead. I've also tried using a simple "exit" command from the devices to close the telnet connection, but this doesn't produce any better results.

I may simply need a suggestion on how to better track the opening and closing of my sessions and ensure that, between devices, no spawn sessions remain open. Any help that can be offered will be much appreciated.

Thank you!


Solution

  • The Error?

    spawn telnet too many programs spawned? could not create pipe: too many open files

    This error is likely due to your system running out of file handles (or at least exhausting the count available to you).

    I suspect the reason for this is abandoned telnet sessions which are left open.

    Now let's talk about why they may still be hanging around.


    Not Even, Close?

    Close may not actually close the telnet connection, especially if telnet doesn't recognize the session has been closed, only expect's session with telnet (See: The close Command). In this case Telnet is most likely being kept alive waiting for more input from the network side and by a TCP keepalive.

    Not all applications recognize close, which is presented as an EOF to the receiving application. Because of this they may remain open even when their input has been closed.

    Tell "Telnet", It's Over.

    In that case, you will need to interrupt telnet. If your intent is to complete some work and exit. Then that is exactly what we'll need to do.

    For "telnet" you can cleanly exit by issuing a "send “35\r”" (which would be "ctrl+]" on the keyboard if you had to type it yourself) followed by "quit" and then a carriage return. This will tell telnet to exit gracefully.

    Expect script: start telnet, run commands, close telnet Excerpt:

    #!/usr/bin/expect
    set timeout 1
    set ip [lindex $argv 0]
    set port [lindex $argv 1]
    set username [lindex $argv 2]
    set password [lindex $argv 3]
    spawn telnet $ip $port
    expect “‘^]’.”
    send – – “\r”
    expect “username:” {
        send – – “$username\r”
        expect “password:”
        send – – “$password\r”
    }
    expect “$”
    send – – “ls\r”
    expect “$”
    sleep 2
    # Send special ^] to telnet so we can tell telnet to quit.
    send “35\r”
    expect “telnet>”
    # Tell Telnet to quit.
    send – – “quit\r”
    expect eof
    # You should also, either call "wait" (block) for process to exit or "wait -nowait" (don't block waiting) for process exit.
    wait
    

    Wait, For The Finish.

    Expect - The wait Command

    Without "wait", expect may sever the connection to the process prematurely, this can cause the creation zombies in some rare cases. If the application did not get our signal earlier (the EOF from close), or if the process doesn't interpret EOF as an exit status then it may also continue running and your script would be none the wiser. With wait, we ensure we don't forget about the process until it cleans up and exits.

    Otherwise, we may not close any of these processes until expect exits. This could cause us to run out of file handles if none of them close for a long running expect script (or one which connects to a lot of servers). Once we run out of file handles, expect and everything it started just dies, and you won't see those file handles exhausted any longer.

    Timeout?, Catch all?, Why?

    You may also want to consider using a "timeout" in case that the server doesn't respond when expected so we can exit early. This is ideal for severely lagged servers which should instead get some admin attention.

    Catch all can help your script deal with any unexpected responses that don't necessarily prevent us from continuing. We can choose to just continue processing, or we could choose to exit early.

    Expect Examples Excerpt:

    expect {           
        "password:" {
            send "password\r"
        } "yes/no)?" {
            send "yes\r"
            set timeout -1
        } timeout {
            exit
        # Below is our catch all
        } -re . {
            exp_continue
        #
        } eof {
            exit
        }
    }