Pages

1/26/11

IOS Control Plane Protection for DoS Mitigation

IOS control plane protection is an extension of control plane policing (CoPP) introduced in 12.4(4)T which allows an administrator to apply a quality of service (QoS) policy to a router's control plane. The control plane handles all traffic which must be processed by the router in software.
A policy can be applied to the control plane generally, as with legacy CoPP, or it can be applied to one of the three control plane "subinterfaces:"
  • Host - Traffic destined for the router itself (management, routing protocols, etc.)
  • Transit - Software-switched transit traffic
  • CEF exception - Traffic which triggers a CEF exception (ARP, non-IP packets, etc.)
To illustrate the benefits of configuring control plane protection, we can observe what happens when an unprotected router is targeted by a primitive denial of service (DoS) attack. We can initiate a primitive DoS against a router at 10.0.0.1 by initiating a UDP flood:
Attacker$ udp-flood.pl 10.0.0.1 1234 64 0
UDP packets are flooded at or near line rate with the intention of overwhleming the recipient. Since these packets are destined for the router itself, each gets punted from hardware to software prcoessing, consuming expensive CPU and memory resources. With no countermeasures in place, the router's processing power is quickly consumed:
Router# show processes cpu sorted
CPU utilization for five seconds: 100%/28%; one minute: 76%; five minutes: 25%
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
  29       85468         295     289722 71.30% 52.46% 16.90%   0 Net Background
   4        2136         118      18101  0.42%  0.35%  0.28%   0 Check heaps
   2          24          50        480  0.08%  0.02%  0.00%   0 Load Meter
  56          12        4131          2  0.08%  0.01%  0.00%   0 Dot11 driver
  80         848         185       4583  0.08%  0.20%  0.07%   0 IP Input
   1           4          11        363  0.00%  0.00%  0.00%   0 Chunk Manager
...
To mitigate this, we can apply control plane protection. A CoPP policy is configured via the modular QoS CLI (MQC) as with any regular QoS policy, and applied akin to a normal interface service policy. To keep things simple, we'll create a policy which just polices inbound UDP traffic to 16 Kbps:
class-map match-all UDP
 match access-group name UDP
!
policy-map CoPP
 class UDP
  police 16000 conform-action transmit exceed-action drop violate-action drop
!
ip access-list extended UDP
 permit udp any any
Finally we apply the service policy to the control plane. In this example, it is applied to the aggregate rather than to a subinterface:
R1(config)# control-plane ?
  cef-exception  Cef-exception traffic control-plane configuration
  host           Host traffic control-plane configuration
  transit        Transit traffic control-plane configuration
  

R1(config)# control-plane
R1(config-cp-host)# service-policy input CoPP
R1(config-cp-host)#
%CP-5-FEATURE: Control-plane Policing feature enabled on Control plane aggregate path
We can relaunch our UDP flood and compare the CPU utilization to what we saw without CoPP:
Router#show processes cpu sorted
CPU utilization for five seconds: 100%/97%; one minute: 53%; five minutes: 32%
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
  55       13492        2055       6565  0.32%  0.08%  0.02%   0 COLLECT STAT COU
   1           4          12        333  0.00%  0.00%  0.00%   0 Chunk Manager  
   2         968          83      11662  0.00%  0.00%  0.00%   0 Load Meter     
   4        3932         206      19087  0.00%  0.60%  0.49%   0 Check heaps    
   5           0           1          0  0.00%  0.00%  0.00%   0 Pool Manager   
   6           0           2          0  0.00%  0.00%  0.00%   0 Timers    
...     
That's probably not what you expected: CPU utilization actually appears to have gone up! What happened? Before moving any further, let's verify that our CoPP policy is indeed performing as expected:
Router# show policy-map control-plane
 Control Plane

Service-policy input: CoPP

Class-map: UDP (match-all)
      6918133 packets, 733322098 bytes
      5 minute offered rate 16552000 bps, drop rate 16551000 bps
      Match: access-group name UDP
      police:
          cir 16000 bps, bc 1500 bytes, be 1500 bytes
        conformed 1575 packets, 166950 bytes; actions:
          transmit
        exceeded 14 packets, 1484 bytes; actions:
          drop
        violated 6921762 packets, 733706772 bytes; actions:
          drop
        conformed 18000 bps, exceed 0 bps, violate 74591000 bps

Class-map: class-default (match-any)
      2 packets, 120 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
      Match: any
Yep, our CoPP policy is policing at merely 16 Kbps inbound, and discarding all other malicious traffic. What gives?
The five-second CPU statistics listed at the beginning of the show processes cpu outputs is composed of two numbers: total utilization and utilization resulting from hardware interrupt requests. In our first report, interrupt utilization accounted for only around 30% of the total CPU utilization, whereas it now accounts for nearly all of it. Conversely, the first report showed the "Net Background" process (responsible for buffer allocation on newer IOS versions) consuming over 70% CPU utilization while the same process's utilization on the second output is negligible (it's not even listed in the top few).
What we've witnessed here is a shift from process-heavy computation to interrupt-heavy computation. Unfortunately, depending on the platform, this can be just as bad. Testing this on an 1811W I noticed that the terminal felt just as sluggish under 100% load with or without CoPP. Fortunately, once the load has been pushed back to the interrupt level, you can adjust the process scheduler allocation to give software processes a little more breathing room.
Scheduler allocation is defined as a proprotion of interrupt run time to process run time; for most platforms, 4000 ┬Ásec of interrupt time is allowed for merely 200 ┬Ásec of process time (according to the documentation). Older platforms might be limited to the scheduler interval command. Sensible scheduler allocation is a hairy topic in itself, but the permitted ranges offer some idea of the intended ratio:
Router(config)# scheduler allocate ?
  <3000-60000>  Microseconds handling network interrupts
  

Router(config)# scheduler allocate 8000 ?
  <1000-8000>  Microseconds running processes

Router(config)# scheduler allocate 8000 1000
An allocation of 8000/1000 worked well to put the spring back into the console of my 1800 series while it was being beaten to death with UDP. I have not experimented with the impact of this allocation on actual throughput. Your mileage may vary.