Search code examples
switch-statementciscospanning-treestp

Cisco - quickest way to achieve STP convergence after link failure


I am configuring a pair of switches, one each for our two datacentres. We have a pair of links between the sites, one a dedicated private fibre, the other a backup 100Mbps connection. For reasons not worth going into, I need to push a number of VLANs across the links, and need to use STP (or equivalent) to manage path redundancy and avoid a switching loop and the associated melt down.

Currently I have set a path cost of 4096 on the backup link on both the root primary and secondary, which works fine, the switches select the fibre and block the backup link until the fibre is down. I have also set a net-diameter of 2 for the VLANs concerned, which has reduced the convergence time to 14s (2x forward time).

I have read that using RSTP its possible to get convergence in around a second, if this is true, would be interested to know how.

Here's what I have so far (this config is more or less mirrored on both switches):

spanning-tree mode rapid-pvst
spanning-tree extend system-id
spanning-tree vlan 102,104-109 priority 24576
spanning-tree vlan 102,104-109 forward-time 7
spanning-tree vlan 102,104-109 max-age 10
!
<snip>
!
interface GigabitEthernet4/0/47
 description Pseduo wire to DC2
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 102-108
 switchport mode trunk
 speed 1000
 duplex full
 spanning-tree vlan 102-108 cost 4096
!         
<snip>
!
interface GigabitEthernet4/0/49
 description 1Gbps to DC2
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 102-107,109
 switchport mode trunk

Solution

  • Even a well tuned RSTP topology still may need a couple of seconds, but a good place to start is by adjusting the 3 timers for RSTP:

    Hello - which defaults to 2 seconds
    Forward Delay - which defaults to 15
    Max Age - which defaults to 20 (Less important here)
    

    Uplinkfast doesn't need to be configured as it's 'kind-of' built into RSTP, just check that your backup link is listed as an 'Alternate' port and is ready to fail over fast.

    For general RSTP housekeeping consider setting all your edge ports to Portfast and considering the impact of any half duplex links on the topology- if possible change all half duplex ports to full duplex, this will make them candidates for using 802.1w (RSTP will fall back to STP on half duplex...)
    Half duplex is just very bad in RSTP, but there are workarounds.


    Also consider what kind of failure occurs when a link goes down:

    Direct Failure - Switch is aware link is down immediately (port goes down).
    This is very fast to fail over to an alternate port as switch is immediately aware of the problem- here you could possibly achieve <1 second depending on how long switch takes to detect the port is down.

    Indirect Failure - Traffic is being blackholed, but the switch thinks the link is up.
    Much longer to detect as 3xHello Packets must be discarded. If this is what's happening, consider setting your hello packet timer to 1 second.
    (Missing 3 hellos will tell the switch the link is down... 6 seconds vs. 3 seconds)