Understanding and Troubleshooting Discards

Version 7

    On Ethernet and Switchport interfaces, the "Discard" stat can be incremented for many different reasons; some indicating healthy network operation and others indicating a network issue. Understanding the discard stat is important to evaluating your network health in correlation with them. This document explains the discard stat thoroughly as well as offering reasons for why a discard is incremented and what action you should take, if any, when you see this.

     

    Sections Included in this Document

     

    Hardware and Software Requirements

    Explanation of the Discard Stat

    Situations Where a Discard is Incremented

    Troubleshooting Discards

    Further Trouleshooting and Information

    Useful Links

     

    Hardware and Software Requirements


    The discard stat appears on every Ethernet, Switchport, and VLAN interface in an AOS unit. For information on whether or not your unit has these interfaces, please see the AOS Feature Matrix - Product Feature Matrix.


    Explanation of the Discard Stat


    One of the biggest misconceptions concerning the discard statistic is that a discard is an error. In fact, examining the other Ethernet based stats (includes Switchports and VLANs) you will see there is a general set of errors called "input errors" and "output errors". Whenever one of the other statistics like "CRC errors" or "overruns" increments, so does one of these general error counters. When a discard increments, the input and output errors stay the same. What does this mean? That the discard stat is its own unique counter and should not be treated as an error.


    In any healthy network, traffic needs to be discarded at certain points. Consider configuring a switchport to trunk mode. For security reasons, the administrator only allows VLANs 1 and 2 on the link with the switchport trunk allowed vlan 1,2 command. If a packet is received with a VLAN tag of 3, it will be dropped. In this case, a discard will be incremented indicating the interface is working as configured.


    Additional reasons that a unit can increment a discard legitimately will be explained further in this document. Before continuing in this document, it is very important to remember that a healthy network will absolutely have discards and the presence of them does not necessarily indicate a network problem. The cause of discards should be investigated and troubleshot if they are incrementing at a high rate that can not be explained, or when they can be correlated to a specific network problem.


    Situations Where a Discard is Incremented


    When a packet/frame is received on an interface and when it is put in a queue to exit an interface, there are several checks that are run to make sure it is something that should be transmitted. The following sections discuss when a discard may be incremented on the different types of interfaces for one of these reasons. Note that this list only includes the most common reasons that a packet may be discarded - it does include every unique situation where it may occur. However, discards for other than the following reasons should be very rare, and in this case would not warrant the reason to troubleshoot discards anyway.


     

    Layer 2 interfaces are considered to be switchport interfaces (all types), interfaces that function as switchports but have the Ethernet moniker (on the NetVanta 6355, 1224, and 7100), and Ethernet Subinterfaces (802.1q mode). A discard can be incremented for any of the following reasons:

     

    • Receiving frames tagged in a VLAN the unit does not have configured
      • Commonly in networks with multiple VLANs, each switch will have all VLANs in its VLAN database and the inter-switch links will be designated as trunks so that they can carry all VLANs across the link. However, if one of the two switches does not have a VLAN configured that a frame may be tagged with, the link on that switch will increment discards for each such frame because it will not know what to do with the unknown VLAN tag.
    • Frames received that are tagged in the Native VLAN
      • Every trunk port in a network has a "native" VLAN which means this VLAN's traffic will not be tagged across the link. If this link does receive a packet that is actually tagged in that VLAN instead of untagged, the packet will be dropped as it does not conform to port expectations (this is a security measure).
      • For example, consider two switches connected by a trunk link. Switch A has the command switchport trunk native vlan 2 configured and switch B has the command switchport trunk native vlan 3. In this case, when switch A sends a frame tagged with VLAN 3, Switch B will drop it because it expects that VLAN to be untagged. The same will happen in the other direction when Switch B tags VLAN 2.
    • Spanning Tree blocked ports (in stable topology or in a transitive state)
      • The port that is in the "blocking" state would increment input discards for any traffic that is sent across the link. Because this port should not generally be a destination port for the mac address table, this would mostly be broadcast and multicast traffic.
    • Unknown Layer 2 Protocols
      • Most of these are broadcast or multicast at L2. In this case we will forward the frames out and increment a discard counting that the frame was technically also destined for us, assuming they are using a broadcast Mac address like FF:FF:FF:FF:FF:FF
      • Most unknown protocols will be incremented as "unknown protocol" and as an error - but in certain cases these protocols automatically fail discard checks because of the unknown format and are therefor incremented as a discard instead.
    • Port-authentication and Port-security violations
      • If a violation occurs and a port is put into "restrict" or "protect" mode, the violating unit's packets will be discarded.
      • If a MAC address that is bound to a port is plugged into another port with or without port-security, all its frames will be discarded.
      • The NetVanta 123x series does not have Port-security violations, so it will discard everything that is not part of the secure MAC table on a port.
    • Frames exceeding storm control limits
    • Frames exceeding Destination Lookup Failure (DLF) limit
      • DLF applies to units that continue to send to addresses that the unit can not locate.
      • This can also happen if the MAC table does not have an entry for a unit but we do have a host entry in the route-cache. Fix this by setting the MAC table timeout to 21 Minutes and setting all endpoints to edgeport.
    • All zero MAC addresses for either the source or destination address.
      • This is considered an invalid mac address.
    • Gratuitous ARPs with all 0's for the IP address
      • This is generally due to a end user unit misconfiguration.
      • This presents a debug message when using the debug arp command as well.
    • The source and destination MAC Address are the same.
      • This is called a "Land" attack and is actually more prevalent in the IP layer (layer 3) with IP address equaling each other.
      • This was originally an attack developed as early networking equipment did not know what to do with a packet where source and destination addresses are equal.
    • Destination interface for the frame and the source interface for the frame are the same.
      • This occurs when sending unit does not know where the unit is located, but the receiving unit does. Normally this will be seen when a switch broadcasts a frame out because it doesn't have the destination address in its CAM table. A switch downstream may receive it, but have that MAC address as coming from the same port as where the broadcast entered. In this case, the switch will drop the frame instead of sending it back to avoid congestion
    • Result of a Hardware ACL dropping traffic.
      • This happens if a particular MAC address is denied in a hardware ACL. If the hardware ACL is using Layer 3 addresses to block and allow traffic, drops are not incremented as layer 2 discards.
    • Hardware queue on interface is full (overbooking).
      • Though an interface may be able to transmit at 100Mbps, traffic does not always follow a strict pattern when being sent. Bursts of traffic can come in causing an interface to become overbooked for a short period of time. Instead of just dropping all packets that do not conform to the interface rate, the interface has a hardware output buffer to keep the extra traffic in until the momentary congestion is gone.
      • A hardware output buffer has a non-configurable depth. When this hardware buffer reaches its limit, it will trigger a hardware interrupt. This will shutdown the interface queuing for a short amount of time to let it catch up before it begins queuing frames to be transmitted again. Any frames queued to be sent during this time will increment output discards. Nothing in the queue before the interrupt will be dropped.  There will not be an exact relation to # of discards and frames lost because the interface stops processing frames during this period of time.
      • This can happen if the other side is using flow control and wants us to slow down, but we have too many frames in the output queue.
      • Another example would be 11+ 100M ports receiving traffic at line rate and the output port is a Gig port. It simply doesn't have the capacity to keep up with the output.
      • Different switch's interfaces have different hardware buffer lengths. Same thing between 10/100 ports and 10/100/1000 ports. Generally the more powerful the switch and the faster port, the bigger the hardware output queue.
      • This is one reason why it is very important to set up Ethernet Class of Service. Please see the document Configuring Ethernet Switch QoS and CoS in AOS
      • This has nothing to do with the software queues and the CPU. If one port is overbooked and starts discarding, other interfaces will not necessarily discard.
        • The addendum to this is that you will see discards on other ports if they receive a frame destined out the discarding interface during that time period. For example, if swx 0/1 receives a burst of traffic and starts discarding, and swx 0/3 receives a frame destined to leave swx 0/1, an input discard will increment on swx 0/3.

     


    For VLAN interfaces only

      • If a layer 2 discard increments, due to a reason mentioned above, on a switchport in access mode, a discard will also increment on the associated VLAN interface.
      • If a layer 2 discard increments on a switchport in trunk mode, it will show up on either the sending VLAN or the receiving VLAN interface based upon the part of processing the frame was discarded during.
        • For example, if a frame is received on a trunk port sourced from VLAN 1, and it is discarded upon entry as the source and destination mac addresses are equal to each other, VLAN 1 will increment a discard. If a new frame is received from VLAN 1 destined for VLAN 2 and the destination port is in the discarding state due to overbooking on the output port, VLAN 2 will increment a discard.

     

    On Ethernet Ports only

      • Routed Ethernet ports (not ports designated "ethernet" on units like the NetVanta 6355 and 7100) are unique in that they possess some layer 2 functions with the added layer 3 functions. Though the interface does not switch, it does perform mac address operations and also performs checks on the Ethernet frames that are input (for example, if the MAC address is all zeros). When one of the non-switching examples from the Layer 2 Discards section occurs on an Ethernet port, it will increment a discard.

     

    VLANs and Ethernet Interfaces

      • The L3 software buffer is full and the unit cannot process incoming and outgoing packets.
        • This can happen because of high CPU utilization (i.e. the CPU does not have enough resources to process the software queues).
          • This generally happens because of the thread "PacketRouting" which performs the majority of the router functions in any AOS unit.
          • If the PacketRouting process hits a queue depth of 80%, it will trigger an interrupt which will cause it to the unit to stop transmitting and processing traffic for a short period of time (micro seconds), causing the interfaces to increment discards as input and output queues fill up.
          • You can see what the current processor utilization is with the command show process cpu as shown below:

     

    #show process cpu

    System load: 1sec:5.59%  1min:7.03%  5min:5.96%  Min: 0.00%  Max: 100.00%

    Context switch load: 0.14%

                                          Invoked  Exec Time    Runtime    Load %%

    Task Id    Task Name        PRI STA   (count)     (usec)     (usec)     (1sec)

    1          Idle               0 W   608845940        233     942640      94.26

    3          PC Config          7 S   499071177        968      13040       1.30

    4          PacketRouting     38 W   766295348         49      17963       1.80

    5          Timer             39 W   472845169         13       5659       0.57

    6          Thread Pool        4 W       12026        120          0       0.00

    7          Timer-00          10 W   488120549          2       1132       0.11

    8          Nm01               5 W           0       1966          0       0.00

    9          Clock              9 W    14334724         12         30       0.00

    10         FrontPanel        37 W    97105578         64       1301       0.13

    11         con0              39 W       83786          6          0       0.00

    12         CF Manager         9 W    95424567          2         57       0.01

    13         ICP Session        8 W    10810188        909        988       0.10

    14         RouteTableTick     6 W     8061684         48        190       0.02

    15         RouteTableTick     6 W     8104871         66        190       0.02

    16         OSPF               6 W    13095560        264        365       0.04

    17         IGMPTick           6 W     4850112        110        113       0.01

    18         IGMP-Receiver      6 W       97162         31          0       0.00

    19         IP Events         24 W     4965396         24         23       0.00

    20         tcptimer          22 W     2717686         10         92       0.01

    21         tcpinp            22 W      421589        156        502       0.05

    22         tcpout            22 W     1341238         67        478       0.05

    23         Port Manager       9 W    96136488         36        755       0.08

    24         eth01             39 W   317655915          6       1319       0.13

    25         eth02             39 W           1          2          0       0.00

    26         SnmpThread         6 W   239578062         27       1173       0.12

    27         WWW               19 W     1139607         38          0       0.00

    28         DnsClient         16 W     2410161         25          0       0.00

    29         DnsProxy          16 W     3563737         63          0       0.00

    30         DnsTable          16 W      955533          8          0       0.00

    31         sec               39 W   199770358         20        785       0.08

    32         IKE                6 W      864911         61          0       0.00

    33         IPSecKeyGen        4 W           0     463723          0       0.00

    34         SCEP               6 W           0     461818          0       0.00

    35         MediaConnectio~   34 W     5229625        556        551       0.06

    36         FTPServer List~    5 W           0     460915          0       0.00

    37         SMTP Client       16 W        1339         69          0       0.00

    38         SNTP Client       19 W          57         21          0       0.00

    39         Switch Managem~   37 W           0         76          0       0.00

    40         Switch Mainten~    4 W    23873756        989       3938       0.39

    41         Stacking           9 W     4778371         27         30       0.00

    42         UCC3              39 W    64805586          3        642       0.06

    43         RSTP              37 W           0        130          0       0.00

    44         RSTP              37 W   1209743598         15       1859       0.19

    45         CLIInjectQ         6 W           0      59603          0       0.00

    48         RipOut             6 W     7822251         17       2024       0.20

    49         RipIn              6 W           0      37080          0       0.00

    50         UDP Relay         19 W           1         67          0       0.00

    51         PacketCapture      4 W     9702057         39         78       0.01

    52         PING Client       16 W     1736792         47          0       0.00

    53         DHCP Server       29 W       13013         42          0       0.00

    54         UDP In            36 W     3222258        109          0       0.00

    55         Flow Meter Log~   17 W    11133629        108        155       0.02

    56         CFM Maint         38 W     4805227         26         26       0.00

    57         OSPFv3             6 W    10364499          6         26       0.00

    58         DHCP Client       19 W      733295         47          0       0.00

    59         BGP Thread         6 W    10251697          5        130       0.01

    60         TWAMP-Control      6 W     1018526        171          0       0.00

    61         TWAMP-Test        16 W           1         11          0       0.00

    62         TFTP               5 W      986871         28          0       0.00

    63         AUTOLINKQ          4 W           0     173963          0       0.00

    64         HttpClientQ        6 W           0     119685          0       0.00

    65         ntpd              19 W     7264032         83        282       0.03

    66         TFTPThreadPool     4 W           0      48761          0       0.00

    67         TFTPThreadPool     4 W           0      48689          0       0.00

    68         TFTPThreadPool     4 W           0      48568          0       0.00

    69         TFTPThreadPool     4 W           0      48501          0       0.00

    70         TFTPThreadPool     4 W           0      48433          0       0.00

    72         DHCPv6 Server     29 W           1        199          0       0.00

    146        MRouteTick         6 W           0        522          0       0.00

     

        • You can also use the show process queue command to see the max depth that each queue has gotten to as a percentage. This does not reset until it is manually cleared or the unit is rebooted. This will not be indicative of spikes in utilization, but rather consistent utilization heights:

     

    #show process queue

    Queue                          Max Depth (%)

    ------------------------------ -------------

    PC Config                                  7

    PacketRouting                             39

    FrontPanel                                 0

    ICP Session                                1

    RouteTableTick                             2

    RouteTableTick                             2

    OSPF                                       3

    IP Events                                  3

    IKE                                        1

    IPSecKeyGen                                0

    MediaConnectionQueue                       0

    Switch Management                          0

    Switch Maintenance                         7

    Stacking                                   0

    RSTP                                       0

    RSTP                                       5

    CLIInjectQ                                 0

    PacketCapture                              0

    UDP In                                    76

    Flow Meter Logging                         2

    CFM Maint                                  0

    OSPFv3                                     4

    BGP Thread                                 0

    AUTOLINKQ                                  0

    HttpClientQ                                0

    MRouteTick                                 0

     

        • As in the hardware buffer case, the number of discards incremented is not exactly equal to the number of packets actually lost because the CPU stops processing packets.
        • This will not affect anything that is routed or switched in hardware. This would only affect packets that are sent to the processor for pure layer 3 routing, firewall, etc.
          • Addendum: if the hardware route-cache is full, the overage is being routed by the processor. If the processor becomes over-utilized as well then discards would begin incrementing.
      • Configured QoS and traffic-shaping policies discard packets based on prioritization.
        • On an interface you can tell if the shaper dropped the packets as shown below.


     

    #show int eth 0/1
    ...lines ommitted...
    Interface Shaper: 50000/312500/312500 (rate/budget/max budget)
    6250 bytes added to budget every 1 ms
    packet stats: 197758/0/0/0 (packets sent/waiting/dropped/delayed)

     


      • Packets discarded because no route exists to the destination.
        • This check happens upon entry to the unit (to save processing down the road if the traffic cannot be routed anyway), so generally these will only be input discards unless routing information changes while the packet is being processed.
      • Packets multicast at layer 3 that can not be routed (we will also discard a copy of the one "destined for us", for example IPv4 address 224.0.0.1).
      • Packets with invalid IPv4 and/or IPv6 address information
        • This would include source and destination address being equal, invalid field lengths, etc.


    Troubleshooting Discards


    It is important to note a couple of things before continuing with this section. First, as stated earlier in this document: discards are a normal byproduct of network operation. You should only be troubleshooting discards on your units if you notice an actual network problem that could correlate with them, or you notice them increment at an increased rate from the normal for that interface. Secondly, you should go through the above sections explaining the types of discards as well. Not only are the troubleshooting steps below based on these examples, but there are many implied troubleshooting steps you can take by going through the above sections. Not all of these will be covered below. For example, reading above you know that when the source and destination MAC address in a frame are equal, the frame is dropped. So this implies that if you see these types of frames in your network through some type of packet analyzer that would explain at least some of the discards. This is not directly discussed below because it was fully covered in the description section.



    • Verify VLAN configurations on ports and switches experiencing the discards
      • It is important to make sure the port is in the correct mode (trunk or access).
      • If a trunk, make sure the unit plugged into it is not tagging traffic in a VLAN that is not configured on the switch. This can be done by verifying that unit's configuration, or by using a port mirror to take a packet capture using the document Configuring Port Mirroring in AOS.
      • All the VLANs that have a path to the particular unit you are using should be added to the unit's VLAN database using the vlan <VLAN ID> command.
        • Similarly, make sure there are not non-used VLANs configured. Not only does this create a security concern, but if a unit is accidentally placed in this VLAN, all its traffic may cause discards to increment on other switches.
      • This requires that you also check VLAN configuration on units connected to this unit to make sure they are correct as well.
    • Check the Spanning-Tree Topology
      • This can be done using the show spanning-tree blockedports command. You can see if the port incrementing discards is in the blocking state.
    • Check the interface bandwidth to see if its possibly overbooked
      • This can be done using the show interface command:


    #show interface sw 0/1

    swx 0/1 is UP, line protocol is UP

     

      Description: AP

      Hardware address is 00:A0:C8:00:E1:7A

      BW is 10000 Kbit

      100000b/s, negotiated full duplex, configured full-duplex

      input flow control is disabled, 0 pause frames received

      ARP type: ARPA; ARP timeout is 20 minutes

      Last clearing of "show interface" counters: never

      30 second input rate 1253742 bits/sec, 156 packets/sec

      30 second output rate 976856 bits/sec, 123 packets/sec

            Queueing method: fifo

        Output queue: 0/256/0 (size/max total/drops)

        Interface Shaper: NOT ENABLED

        13423523 packets input, 12312412412 bytes

        12536452 unicasts, 1232432 broadcasts, 243524 multicasts input

        0 symbol errors, 122 discards

        0 input errors, 0 runts, 0 giants

        0 alignment errors, 0 crc errors

        13422343 packets output, 13253434232 bytes

        11242342 unicasts, 23424342 broadcasts, 123124 multicasts output

        0 output errors, 0 deferred, 123 discards

        0 single, 0 multiple, 0 late collisions

        0 excessive collisions

     

      • You can see above the bandwidth being used on the interface currently to tell if its close to being overbooked.
      • To see if there is possibly bursty traffic coming into the unit possibly causing discards, use show interface sw 0/1 realtime and you will see the counters increment live. If there are jumps in traffic that are very large, this could be a source.
    • Check the MAC address table in the unit to make sure there are not more entries than the unit supports according to the AOS Feature Matrix - Product Feature Matrix.
    • Check the hardware ACLs that are in the unit (if any).
    • Process to the Further Trouleshooting and Information section.


     

    • As with a layer 2 interface, check the interface stats using the show interface <type> <slot/port> to make sure the interface bandwidth is not being exceeded.
    • Check the unit's QoS and shaping policies to see what type of traffic is dropped and at what point it should be dropped.
      • Use the show qos map interface <type> <slot/port> command to see the different QoS queues and what sections are being dropped (as well as the interface command shown above in the layer 3 discard explanation section.):

     

    show qos map int eth 0/1

     

    eth 0/1

      qos-policy out: Voip

     

       map entry 10

         match dscp 46

         priority bandwidth: unlimited

           note: since unlimited, other qos bandwidths cannot be assured

         packets matched: 1232435, bytes matched: 1232453645

     

       map entry default

         packets matched: 624, bytes matched: 477490

         packets dropped: 0, bytes dropped: 0

         30 second offered rate 224040 bits/sec, drop rate 0 bits/sec

     

      Input QoS Map not assigned for this interface

     

    • Check the CPU for over-utilization
      • show proc cpu shows information relative to the present as shown in the description section above.
      • show proc queue shows information relative to queue depths since the last clearing of the queues (clear proc queue), or a reboot.
        • Note: Once an individual process queue hits 80%, interrupts will start causing possible discards.

     

    #show process queue

    Queue                          Max Depth (%)

    ------------------------------ -------------

    PC Config                                  7

    PacketRouting                             39

    FrontPanel                                 0

    ICP Session                                1

    RouteTableTick                             2

    RouteTableTick                             2

    OSPF                                       3

    IP Events                                  3

    IKE                                        1

    IPSecKeyGen                                0

    MediaConnectionQueue                       0

    Switch Management                          0

    Switch Maintenance                         7

    Stacking                                   0

    RSTP                                       0

    RSTP                                       5

    CLIInjectQ                                 0

    PacketCapture                              0

    UDP In                                    76

    Flow Meter Logging                         2

    CFM Maint                                  0

    OSPFv3                                     4

    BGP Thread                                 0

    AUTOLINKQ                                  0

    HttpClientQ                                0

    MRouteTick                                 0

     

    • Check show ip route to make sure there is a route to all destinations.

     

    Further Troubleshooting and Information


    The best way to know what could be causing the discards in a network is to know your network. If you aren't aware of the protocols in your network, how they function, which types of hosts are connected to each interface, and so on, you wont be able to fully understand the root cause of discards. You should make sure you are familiar with the below:


    • Protocols in your network.
      • Do they use multicast or broadcast traffic?
      • Are there proprietary protocols that AOS units may not participate in or understand?
      • How much bandwidth do these applications use?
    • VLAN configuration
      • Which units should be in each VLAN?
      • Are the units in my network from different vendors consistent in their VLAN tagging and treatment of access and trunk ports?
    • Know which sections of your network require what amount of bandwidth
      • If you have sections of the network that serve as bottlenecks for larger bandwidth network portions, this should be resolved as it will cause discarded traffic.
      • Design your network so as to avoid bottlenecks whenever possible.
      • Set up QoS on a bottleneck to make sure that less time sensitive traffic is dropped during periods of over-utilization.
    • Make sure you have purchased the correct equipment sufficient for handling the amount of load and features you require.
    • This will help prevent over utilization issues.
      • Make sure your network is secure.
        • An insecure network can experience problems that may cause discards like a denial of service attack using up available bandwidth, of an attacked using insecure VLANs to transmit traffic.
        • Please see the troubleshoot document Security Best Practices for AOS Products for more information.


      In the end, the best way to troubleshoot discards is to take a packet capture on the interface or interfaces seeing the excess stat. This will tell you what is going on because you can see actual packets and match it with all the potential causes described above.


      Several important notes to remember:

       

      • Running port scanners and monitoring programs commonly cause discards because they send uncontrolled, bursty traffic.
      • Frames/packets discarded because of CRC errors, runts, giants, and other errors are not included in the discard count.
      • Packets dropped by the firewall do not increment discards.
      • Packets dropped by access-groups do not increment discards


      Useful Links

       

      For more information on configuring Class of Service, please see Configuring Ethernet Switch QoS and CoS in AOS

       

      For more information on network security, please see Security Best Practices for AOS Products

       

      For more information about properly provision switches and network design recommendations, please see Switch Provisioning Best Practices

       

      For more information on properly configuring VLANs, please see Configuring InterVLAN Routing in AOS - Quick Configuration Guide

      .