CCIE – STP 802.1D – Fundamentals

INTRODUCTION


For resiliency purposes, Layer2 topologies are designed and deployed in a physical loop.

Unlike the IP protocol which uses the TTL field as a built-in mechanism for preventing packets from looping endlessly through the network, the Ethernet protocol does not provide such facility and therefore, unless we can prevent a layer2 loop, the following problems would occur:

  1. Broadcast storms – endless broadcast of frames through the entire L2 domain
  2. Hosts end-up receiving duplicates frames from different switches
  3. Unstable mac-address tables due to continuous host mac address learning on different ports
Symptoms of a broadcast storm ...

A Broadcast storm could have a high negative impact on network connectivity. However, given the evolution in hardware and compute power, broadcast storms could also be very difficult to detect and troubleshoot. The following could be indicators of a broadcast storm:

  1. High CPU
  2. Continuous MAC address change notifications at the switch console
  3. Group of users complaining of slow connectivity, application problems, across one or multiple VLANs
  4. Providing you are monitoring the links, look for saturated links and duplicate frames 

The STP (Spanning Tree Protocol) was developed to avoid the problems above by dynamically determining a logical, loop-free layer2 topology. It does so by temporarily blocking ports and therefore eliminating the loop; should a link failure occur, relevant ports are automatically enabled and a new loop-free topology is determined.

In the real world ...
Whether or not a physical loop exists, STP should still be enabled as a fail-safe mechanism against Layer2 loops which could be introduced as a result of human error.

In time, STP evolved into several standards as below:

  • 802.1D the original Spanning Tree Protocol standard; Cisco later enhanced the standard into PVST+ (per-vlan STP) allowing for different loop-free l2 topologies on a per-vlan basis. Cisco also introduced several other features to improve convergence times – PortFast, BPDUGuard, LoopGuard, BPDUFilter, UplinkFast and BackboneFast
  • 802.1W – Rapid Spanning Tree protocol / aka. RSTP- an evolution to the original standard providing much faster reconvergence by incorporating enhancements which Cisco had previously introduced with PVST+
  • PV-RSTP – Cisco’s proprietary review of the original IEEE 802.1w, adding VLAN support
  • 802.1S – Multiple STP (MSTP) was later developed as an evolution from Cisco’s PV-RSTP providing a vendor agnostic approach to VLAN support within STP; Cisco’s MSTP implementation is known as MST

Note that despite several STP versions, all of them function on the same fundamental principles introduced by the original STP and therefore, making them backward compatible.

In this first part I will therefore dedicate this blog to the fundamentals of STP 802.1D; part II will follow with the enhancements implemented within the protocol to speed up convergence.

 


DEFINITIONS 


  • A SEGMENT is a link between two switches; since switches have multiple ports, there could be multiple segments between the same two switches
  • BPDU – stands for Bridge Protocol Data Unit and it represents the encapsulated data required to run the STP protocol. Each BPDU is encapsulated within an Ethernet frame:
    • Ethernet src mac matches interface’s mac the frame is sent from
    • Ethernet dst mac is the multicast address 01:80:c2:00:00:00; all switches participating in STP will listen on frames destined to this multicast mac address
  • SYSTEM ID (SID) – every bridge/switch within STP will have a unique identifier; originally, this was a six bytes (48 bits) unique value matching the base MAC address of the switch. With the need of incorporating VLAN information within each BPDU, allowing therefore for per-vlan spanning tree topologies, the system id got extended by borrowing 12 bits from the Bridge Priority (initially a 16 bits value). This configuration is now active by default on most switches.
  • BRIDGE PRIORITY (BP) – this yet another important concept within STP with an important role in the Root Bridge election process. Originally a 2 bytes (16 bits) value, it is now 4 bits long only, since passing 12 bits to extend the system-id. Below is an illustration of how the System-ID and the Bridge Priority are encoded in the BPDU:

Confusion ...

The illustration above is widely accepted and supported by Cisco’s CLI output:

Notice that the Extended System ID and the Bridge priority are displayed as one!

Though if we look at the actual BPDU format from in Wireshark …

Notice that the Priority is 32768 and not 32769; also the VLAN ID is identified as the System ID extension.

    • the lower the Bridge Priority value, the highest the priority
    • the Bridge Priority can take values within the 0 – 61440 range (binary 0000 0000 0000 0000 – 1111 0000 0000 0000), in increments of 4096 only

  • BRDIGE ID (BID) – the unique System ID and the Bridge Priority will together form the Bridge ID
  • With STP, every link (interface) is assigned a LINK COST which is later used in calculating the total cost to the root bridge. The switch automatically applies default preset values (platform and STP version dependent) but, these could also be manually configured.

 


UNDERSTANDING THE BPDU


The Spanning Tree Protocol relies on BPDUs which are sent throughout the entire L2 network. Each BPDU contains the following information:

  • Protocol Identifier & Protocol version identifier identify the protocol and its version
  • BPDU type – this will vary across the different STP variations/standards and will determine which BPDU flags are set. With standard STP (802.1d), the BPDU can be Configuration BPDU or Topology Change BPDU and only bit 0 and 7 are used, providing Topology change functionality – details here.
  • Root Identifier identifies the Root Bridge; it will match the bridge with the lowest BID (BP & SysID):
    1. The Lowest Bridge Priority wins; otherwise …
    2. The lowest MAC address wins
By default ...
Since by default, the BP is the same for each VLAN, the Root Bridge will always match the bridge with the lowest MAC address, unless the BP is changed.
  • Bridge Identifier matches the BID of the bridge which sent the BPDU
  • Root Path Cost is the best (lowest) cumulative link cost to reach the Root Bridge
  • Port Identifier identifies the port ID of the interface out of which the BPDU was sent out. The Port Identifier encodes the Port Priority as well – 4 bits for the port priority; 12 bits for the actual Port Id. Note that only the Port Priority can be changed and in increments of 64 only:

  • STP timers:
    • MAX AGE time before the stored BPDU expires; by default this is 20s (10 missed HELLOs)
    • HELLO TIME is the interval within which BPDUs are sent; by default this is 2s
    • FORWARD DELAY is the time it takes for a port to transition from Learning, into Forwarding state; by default is 15s

 


PORT STATES


STP identifies the following port states along with their timers:

    • Disabled
    • Blocking (1 x max age; 20s)
    • Listening (1 x forward delay; 15s)
    • Learning (1 x forward delay; 15s)
    • Forwarding
Transitioning through the port states ...

Below is an example of how ports transition through the different states; in this case, the process starts as soon as the interface on the switch is enabled:

SW-1(config-if)#no shut
SW-1(config-if)#
*Jul 17 21:53:18.887: set portid: VLAN0001 Et3/2: new port id 800F
*Jul 17 21:53:18.887: STP: VLAN0001 Et3/2 -> listening
*Jul 17 21:53:20.884: %LINK-3-UPDOWN: Interface Ethernet3/2, changed state to up
*Jul 17 21:53:21.884: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet3/2, changed state to up

[15 seconds after]

*Jul 17 21:53:33.892: STP: VLAN0001 Et3/2 -> learning
*Jul 17 21:53:48.892: STP[1]: Generating TC trap for port Ethernet3/2

[15 seconds after]

*Jul 17 21:53:48.892: STP: VLAN0001 Et3/2 -> forwarding
SW-1(config-if)#

Furthermore, a port that is in a BLOCKING state, waits an additional 20s (10 x HELLOs | MAX AGE) when transitioning to LISTENING state. This would happen for example when a topology change is required due to a network failure in which case, one of the previously blocked links, would need unblocking.

We can clearly see that it takes a minimum of 30s – 32s (+2s HELLO)  for a link to transition into a data forwarding state; but could take up to a 50s-52s in reconvergence scenarios. 


PORT ROLES


STP introduces a few Port Roles which are determined based on the “quality” of the BPDUs received on that port

  • Route Port (RP)
    • Will be the port on the path back to the root bridge; the port is selected based on the following criteria:
      1. Lowest Path cost to the root bridge
      2. Lowest Bridge ID (on received BPDU)
      3. Lowest Port Priority (on received BPDU)
      4. Lowest Port ID (on received BPDU)
    • All bridges will have a single RP each, apart from the Root Bridge where all ports are DP
The Port ID & Port Priority
Note that there is no explicit Port Priority field in the BPDU. However, the port priority is deducted from the Port ID field – this is a composite value comprising of 4 bits for Port Priority and 12 bits for the interface ID (similarly to the Bridge ID).
    • port upstream to the root bridge
    • located within the best path (lowest cost) to the root
    • Route Ports will always be in the forwarding state
  • Designated Port (DP) faces downstream from the root bridge; designated ports would, by default, be in forwarding state. There must be a DP on each segment within the topology.
    1. On segments where there is a RP, the remote end will be set a DP; otherwise …
    2. The ports on the closest bridge to the root bridge, are set as DP (see example further down)
  • Non-Designated Port (ND) will always be in a blocking state, ready to transition into Listening -> Learning -> Forwarding state in failover scenarios

 


HOW DOES IT WORK?


One important process in determining a L2 loop free topology is calculating the link costs. In all documentation I found online, the link cost is being used. But we don’t really assigned costs to links per-se, rather to interfaces. So one question I had was: What is the assumed link cost when assigning different values at each end of a link?

I then decided to just lab it and experiment and came up with the topology below where I have tried to move away from the defaults:

  • Each interface at the end of every segment, has a different link cost assigned
  • I have changed the BP on SW3

Now let’s start working out the loop free topology.


I. Electing the Root Bridge (RB)

Initially all switches declare themselves as root and start sending BPDUs out, every 2s; the switch with the lowest BID becomes the root bridge (RB). In this case, notice that Bridge Priority on SW3 is the lowest and therefore, SW3 becomes the ROOT BRIDGE.


II. Electing Root Ports

As discussed already, BPDUs will carry cost information as they get sent out. When a switch receives a BPDU, it will add the cost value set on the incoming interface.

Which interface cost is the link cost?
Regardless the different cost values on each end of the links, the assumed link cost will be the cost value set on the upstream interface, towards the root bridge. 

If we take for instance a BPDU originated and advertised out interface e0/2 on SW3, the Link cost is set to zero – it makes sense as the cost to reach itself is indeed nil. When receiving the BPDU on interface e0/2, the cost is incremented to 10 – the local cost e0/2 interface. Further from SW1, as it reaches SW2 on e0/0, the cost will again be incremented to 12 (+2 – the cost on e0/0 interface on SW2); and so on…

So to workout the cost to the route-bridge, we just need to add the costs on the upstream interfaces towards the route-bridge (in reverse order from how the BPDUs get advertised).

With that understood, we can now workout the Route Ports.

Paths SW1 to SW3 => Best cost is 6

  • SW1 -> SW3 : Cost = 10
  • SW1 -> SW2 (out e0/2) -> SW3 : Cost = 8
  • SW1 -> SW2 (out e0/1) -> SW3 : Cost = 6
    • SW1 – e0/0 is RP
    • SW2 e0/1 is RP
  • SW1 -> SW2 -> SW4 -> SW3 : Cost = 14
  • SW1 -> SW4 -> Sw3 : Cost = 11

Using the same logic, I worked out all other RPs.


III. Electing Designated Ports & Blocking ports

  • All ports on the Root Bridge are Designated Ports
  • On all segments with Route Ports, the remote end will be a Designated Port
  • On the remaining segments, all ports on the closest bridge will be elected Designated Ports
  • Remaining ports are set to Blocking

Eventually, we should end-up with a topology like this:

And to verify …

If I now wish to optimise the STP topology so that SW1 uses its directly attached link to the root, I could change the link cost on SW1 e0/2 to a value lower than 6, as this is currently the best cost to the root.

This is done using the interface command: spanning-tree cost 5

I could also Remove the existing static cost configuration and change the link bandwidth to 1 Gbps; this will make the link cost automatically set to 4.

When looking at the root path cost, another useful output is provided by the show spanning-tree vlan <x> interface <ifId> details command. I believe the output below is self explanatory:

 

 

Thank you,

Rafael A. Couto Cabral • LinkedIn Profile
Cisco​ | F5 | VMware Certified • PRINCE2 Practitioner

Related Post

Comments are closed.