Cake is a comprehensive queue management system, implemented as a queue discipline (qdisc) for the Linux kernel. It is designed to replace and improve upon the complex hierarchy of simple qdiscs presently required to effectively tackle the bufferbloat problem at the network edge.
For installation and configuration instructions, please see the main Cake page.
Cake’s fundamental design goal is perceived simplicity. This is achieved by integrating lots of functionality into one qdisc, resulting in:
Cake is logically arranged as several layers of functionality:
Each of the above layers is based on state-of-the-art techniques, many of which have improvements not previously realised. These are briefly explained below.
Additionally, most calculations are now performed on a time basis, with the remainder on a bytes basis. This is a fundamental improvement over many qdiscs which count only packets, which in practice vary greatly in size.
Cake’s shaper is not a token bucket filter (TBF). It operates in deficit mode, which is essentially the opposite way to how a TBF works.
A major drawback of TBFs is that they require the bucket size to be configured explicitly. This not only runs counter to Cake’s goal of perceived simplicity, but limits latency performance in the typical Cake application, immediately upstream of a bottleneck link’s dumb FIFO. On flow startup, a TBF will dump the contents of its bucket into the FIFO immediately, so the rate of the TBF must be reduced significantly below the link rate in order to let the FIFO drain. Meanwhile, latency suffers proportionally to the size of the bucket.
Cake instead schedules packets based on time deficits. If no deficit exists when a packet is requested, it can be sent immediately. The transmit time of the following packet is then calculated, and until that time the shaper is placed in deficit mode. While in deficit mode, packets are scheduled using a watchdog timer whenever a request arrives too soon, and transmission times are calculated for a continuous packet train. This continues until the queue drains; if a packet is requested, but none are available and the next transmission time has been reached, the shaper returns to the quiescent state in which the next packet can be sent immediately.
Deficit mode makes the burst size dependent only on hardware and kernel latency (including timer resolution), and minimises bursts without requiring manual tuning. Cake’s shaper can therefore be set much closer to the actual link speed without jeopardising latency performance. Modern hardware can achieve sub-millisecond bursts in most cases.
CPU efficiency is also improved by inverting the user’s “bytes per time” setting into a “time per byte” measurement, so that the packet scheduling operation is essentially a multiply-accumulate rather than involving a division. This is implemented using a manual floating-point representation (mantissa and shift) for maximum precision and dynamic range. This contributes to improved throughput on low-end CPUs.
Diffserv is poorly specified, and not widely deployed outside environments where Highly Paid Consultants are available to set it up. Cake’s priority layer aims to improve that situation by making a basic yet robust Diffserv-based priority queue available with no end-user effort. Hopefully, application support will follow.
Cake provides four traffic classes by default, nominally corresponding to the four classes provided by 802.11e and 801.2p. However, the precise mapping between Diffserv codepoints (DSCPs) and traffic classes is different. In increasing order of nominal priority:
Most traffic falls into the Best Effort class. VoIP, NTP and gaming traffic should be directed to the Voice class, BitTorrent should be directed to the Bulk class, and the Video class is available for any bulk traffic that requires elevated priority.
Cake implements soft admission control, and so is robust against starvation attacks relying on strict priority, which would otherwise be easy to trigger by accident. If a traffic class (including all traffic in higher classes than itself) exceeds its bandwidth threshold, it is demoted in priority until it falls below the threshold again. Hence, if there is no competing traffic, any traffic class can use the full link bandwidth, but it is always possible for new traffic in a different class to start up.
The bandwidth threshold is tracked using the same algorithm as in the shaper, at bandwidths which are scaled from the shaper’s overall bandwidth setting. Packets are not actually delayed at this layer; the actual transmission time is merely checked against the deficit schedule. Priority itself is implemented using a Weighted Deficit Round Robin scheme, with the weight of each class depending on whether its threshold is exceeded or not.
The progression of the bandwidth thresholds is roughly inverse to priority. As with DRR++ (below), the assumption is that more latency-sensitive traffic generally requires less bandwidth, and that there is a fundamental tradeoff between priority and bandwidth which is required for network stability. Enforcing that tradeoff in this manner should encourage applications to choose an appropriate DSCP for their traffic, rather than lazily applying the highest available priority for “maximum” performance.
This was the original core of Cake, inheriting the basic design of fq_codel’s flow isolation scheme. It consists of a hash function over the 5-tuple flow identifier to distribute packets to a large number of queues, and a scheme similar to “DRR+“:http://www.researchgate.net/profile/Mike_MacGregor/publication/3867218_Deficits_for_bursty_latency-critical_flows_DRR/links/54a20b700cf256bf8baf7c61.pdf to choose which queue to service. A separate set of queues and DRR+ state is provided for each traffic class.
Cake’s DRR++ scheme uses two lists of active queues. Newly active queues are placed in the “new flow” list, which is served with strict priority over the “old flow” list. After servicing, queues are always placed in the “old flow” list. Queues found to be empty on servicing are removed from the active lists. This relatively simple scheme naturally prioritises “sparse” flows over “bulk” flows, on the assumption that latency-sensitive traffic is likely to be sparse.
A major enhancement in Cake over fq_codel is replacement of the plain hash function with an 8-way set-associative version. Plain hashes are susceptible to the “birthday problem” in which the probability of hash collision reaches 50% when the table occupancy reaches the square root of the table size (32 flows for 1024 queues), assuming a high-quality hash; we have also found that the hash function fq_codel relies on is suboptimal.
In the set-associative hash, the set of queues is divided into sets of 8 ways. Ways are tagged with the flow identifier last assigned to them, allowing hash collisions to be detected and avoided if another way in the same set is available; either already tagged for the correct flow, or empty and inactive. This is similar to the way set-associative caches in CPUs work.
Cake uses a variant of the CoDel AQM, instantiated for each flow-isolation queue. This arrangement is again inherited from fq_codel, but subsequently modified for improved performance. The major changes to date are:
count
variable now saturates rather than wrapping, and thus
handles overload conditions far better. These are most likely seen
with unresponsive and anti-responsive flows with a high packet rate.target
and interval
parameters are auto-tuned based on the
shaper bandwidth. Target is never less than 1.5 MTUs’ link
occupancy, and Interval is never less than 8 times target. This
improves throughput on slow links (under 1Mbps), and has been tested
successfully down to 64Kbps. At higher bandwidths, target
defaults
to 5ms and interval
to 100ms; these are also the defaults for
standard CoDel.count
is
halved rather than decremented. This improves the long-term
stability of the signalling frequency.interval
a 6:1 dynamic range
around its nominal value, rather than a fixed value. Hence response
to TCP’s rapid slow-start growth is sharpened and to slow
congestion-avoidance growth is dulled.In addition to the four major layers, there are a number of miscellaneous functions which are grouped here for convenience:
Sensible defaults are provided for every aspect of Cake’s configuration. There are no mandatory parameters. All parameters can be specified using a single ‘tc’ invocation, and most parameters can be updated without losing packets using tc’s “change” command.
In many cases, it is only necessary to set Cake’s “bandwidth” parameter, which sets up the shaper layer and performs auto-tuning in several other places. If omitted, the shaper is set to infinite bandwidth (zero time per byte) and thus effectively disabled. This can be made explicit by specifying the “unlimited” keyword instead.
The priority-queue layer is configured using the “besteffort” (disables it), “precedence” (legacy mode, 8 classes), “diffserv8” and “diffserv4” keywords. The default, described above, is “diffserv4”.
The flow-isolation layer is configured using “flowblind” (disables it), “hosts” (ignores protocol and port fields), “dsthost” (uses only destination address), “srchost” (analogous), and “flows” (default, uses entire 5-tuple).
The AQM layer has no configuration options. However, it is planned to add simple tuning options for different prevailing RTTs that may be significantly different from the Internet-scale 100ms currently assumed. Satellite links tend to impose longer RTTs, and enclosed LANs tend to have much shorter RTTs.
Encapsulation compensation has a rich set of shortcut keywords, corresponding to typical ADSL and VDSL configurations and straightforward variants thereof. The “overhead” parameter and ATM compensation flag (“atm” vs. “noatm”) can also be specified directly. Overhead compensation is turned off completely with the “raw” keyword, and the “conservative” keyword sets up a quick-fix, with ATM cells and a full cell of overhead, for where the actual overhead is uncertain.
Cake produces a rich set of statistics detailing the performance of the
various algorithms, and this can also be used to infer information about
the traffic passing through it. Simply run tc -s qdisc
.
As of the ‘fishcake’ release, the statistics for each traffic class are:
Additional statistics are planned in the near future, which will require enlarging the stats struct and breaking compatibility with existing versions of userspace code (tc).