Cut-Through or Store-and-Forward: Ethernet Switching for Low-Latency

2013-7-10 08:14| 发布者: demo| 查看: 1352| 评论: 0|来自: CISCO

摘要: This document focuses on latency requirements in the data center. It discusses the latency characteristics of the two Ethernet switching paradigms that perform packet forwarding at Layer 2: cut-throug ...

Examining More Fields

Switches do not necessarily have cut-through and store-and-forward "modes" of operation. As stated earlier, cut-through switches usually receive a predetermined number of bytes, depending on the type of packet coming in, before making a forwarding decision. The switch does not move from one mode to the other as dictated by configuration, speed differential, congestion, or any other condition.

For example, in the case of a configuration that permits or denies packets with certain IPv4 TCP port ranges, the cut-through switch examines 54 bytes before it makes a forwarding decision. For a non-IP packet, the switch may receive the first 16 bytes of the frame, if the user has configured some kind of QoS policy based on the IP precedence bits in the type-of-service (ToS) byte or on the differentiated services code point (DSCP) bits.

Figure 3 shows a standard IPv4 packet structure in an Ethernet ARPA frame. The cut-through switch takes in 54 bytes of the Ethernet header (not counting the 8 bytes of the preamble, which serves only to wake up the transceiver and indicate the arrival of a frame) and, depending on the vendors' design, may then run a policy engine against the pertinent fields in the IPv4 header to determine whether, for example, the TCP destination port matches the ACL, or the source IP address is in the range of that ACL.

Figure 3. A Cut-Through Forwarding Decision is made as soon as enough bytes are received by the switch to make the appropriate decision

Multipath Distribution

Some sophisticated Layer 2 switches use fields beyond just the source and destination MAC addresses to determine the physical interface to use for sending packets across a PortChannel.

Cut-through switches fetch either only the SMAC and DMAC values or the IP and transport headers to generate the hash value that determines the physical interface to use for forwarding that frame across a PortChannel.

It is important to understand the level of PortChannel support in a given switch. Well-designed cut-through switches should be able to incorporate IP addresses and transport-layer port numbers to provide more flexibility in distributing packets across a PortChannel.

IP ACLs

A well-designed cut-through Ethernet switch should support ACLs to permit or deny packets based on source and destination IP addresses and on TCP and UDP source and destination port numbers. Even though the switch is operating at Layer 2, it should be able to filter packets based on Layers 3 and 4 of the Open System Interconnection (OSI) Protocol stack.

With ASIC abilities to, in a few nanoseconds, parse packets and execute a number of instructions in parallel or in a pipeline, the application of an input or output ACL for a particular interface should not exact a performance penalty. In fact, given more flexible and simpler ASIC code paths, an IPv4 or IPv6 packet will have a predetermined number of bytes submitted to the policy engine to evaluate extremely quickly the results of any ACL configurations.

With or without ACLs, in a configuration that does or does not have a PortChannel, cut-through switching has a latency advantage over store-and-forward switching if the packet sizes are several thousand bytes. Otherwise, cut-through and store-and-forward switching can provide very similar performance characteristics.

Ethernet Speeds

If a switch uses a fabric architecture, ports running at 1 Gbps are considered slow compared with that fabric, which expects to handle a number of higher-speed interfaces typically at wire rate. In addition, well-designed switch fabrics offer a "speedup" function into the fabric to reduce contention and accommodate internal switch headers. For example, if a switch fabric is running at 12 Gbps, the slower 1-Gbps ingress port will typically buffer an incoming frame before scheduling it across the fabric to the proper destination port(s). In this scenario, the cut-through switch functions like a store-and-forward device.

Furthermore, if the rate at which the switch is receiving the frame is not as fast as or faster than the transmit rate out of the device, the switch will experience an under-run condition, whereby the transmitting port is running faster than the receiver can handle. A 10-Gbps egress port will transmit 1 bit of the data in one-tenth the time of the 1-Gbps ingress interface. The transmit interface has to wait for nine bit-times (0.9 nanoseconds) before it sees the next bit from the 1-Gbps ingress interface. So to help ensure that no bit "gaps" occur on the egress side, a whole frame must be received from a lower-speed Ethernet LAN before the cut-through switch can transmit the frame.

In the reverse situation, whereby the ingress interface is faster than the egress port, the switch can still perform cut-through switching, by scheduling the frame across the fabric and performing the required buffering on the egress side.

Egress Port Congestion

Some congestion conditions also cause the cut-through switch to store the entire frame before acting on it. If a cut-through switch has made a forwarding decision to go out a particular port while that port is busy transmitting frames coming in from other interfaces, the switch needs to buffer the packet on which it has already made a forwarding decision. Depending on the architecture of the cut-through switch, the buffering can occur in a buffer associated with the input interface or in a fabric buffer. In this case, the frame is not forwarded in a cut-through fashion.

In a well-designed network, access-layer traffic coming in from a client does not usually exceed the capacity of an egress port or PortChannel going out to a server. The more likely scenario where port contention may occur is at the distribution (aggregation) layer of the network. Typically, an aggregation switch connects a number of lower-speed user interfaces to the core of the network, where an acceptable oversubscription factor should be built into the network's design. In such cases, cut-through switches function the same way as store-and-forward switches.

IEEE 802.1D Bridging Specification

Although cut-through switching may violate the IEEE 802.1D bridging specification when not validating the frame's checksum, the practical effect is much less dramatic, since the receiving host will discard that erroneous frame, with the host's network interface card (NIC) hardware performing the discard function without affecting the host's CPU utilization (as it used to do, in the 1980s). Furthermore, with modern Ethernet wiring and connector infrastructures installed over the past 5 years or more, hosts should not find many invalid packets that they need to drop.

From a network monitoring perspective, Layer 2 cut-through switches keep track of Ethernet checksum errors encountered.

In comparison, Layer 3 IP switching cannot violate IP routing requirements, as specified in RFC 1812, since it modifies every packet it needs to forward. The router must make the necessary modifications to the packet, or else every frame that the router sends will contain IP-level as well as Ethernet-layer errors that will cause the end host to drop it.

Re-emergence of Cut-through Ethernet Switching

In the early 1990s, debates ensued as to what the "best" switching paradigm was, with experts highlighting the advantages of one methodology over the other. Over time, the focus has shifted from cut-through switching to store-and-forward switching. Now, Cisco is bringing back an enhanced cut-through switching model.

Cyclic Redundancy Check Error Propagation

In the 1990s, hubs (or repeaters) increased the occurrence of collisions in enterprise Ethernet networks by extending Ethernet segments, which also increased the presence of fragments. In addition, as a result of quality and engineering problems with Ethernet connectors, cabling infrastructures, and NIC hardware, more invalid packets occurred with half-duplex connections. Like hubs, cut-through switches also forwarded those invalid packets, exacerbating the cyclic redundancy check (CRC) problem.

In addition, since any packet destined for a host or a host group was handled by the receivers through a software interrupt that affected the performance of that host processor, packets containing checksum errors increased the host CPU utilization, in some cases affecting the performance of applications on those hosts.

Feature Parity

In the mid to late 1990s, enterprises wanted more than the limited capabilities of first-generation cut-through switches. They were willing to consider either switching paradigm so long as it offered more sophisticated features.

Enterprises needed ACLs, QoS capabilities, better granularity in the Cisco EtherChannel®, and then PortChannel capabilities in their switches. At the time, ASIC and FPGA limitations presented developers of cut-through switching with significant challenges in incorporating these more sophisticated Layer 2 features. The networking industry moved away from cut-through switching as enterprises' demands for more functions led to an increase in the complexity³ of that forwarding methodology. Those increased complexities could not offset the cut-through switching gains in latency and jitter consistency.

Furthermore, ASIC and FPGA improvements made the latency characteristics of store-and-forward switches similar to those of cut-through switches.

For these reasons, cut-through switching faded away, and store-and-forward switches became the norm in the Ethernet world.

1 2 345 6 7 / 7 页下一页