Bug #401

major retries in the VI AMPDU queue on ath9k

Added by Dave Täht on Jul 8, 2012. Updated on Apr 11, 2013.
Closed Urgent Dave Täht

Description

I find the AMPDU retry statistics to be massively high in the case of VI and still quite high in the case of BE.

(The radios happen to be in MCS4 most of the time, if that matters.
There are only three radios on the network, no competing traffic at all)

root@davesroof:/sys/kernel/debug/ieee80211/phy0/ath9k# cat xmit
    Num-Tx-Queues: 10  tx-queues-setup: 0x10f poll-work-seen: 36206
                                BE         BK        VI        VO

    MPDUs Queued:            27436        174     15066     12730
    MPDUs Completed:         27322        174     15062     12708
    MPDUs XRetried:            114          0         4        22
    Aggregates:              94194          5         0         0
    AMPDUs Queued HW:       620975         49     21536         0
    AMPDUs Queued SW:       438841          7         0         0
    AMPDUs Completed:      1045702         34      6476         0
    AMPDUs Retried:         438217        728    413300         0
    AMPDUs XRetried:         14114         22     15059         0
    FIFO Underrun:               0          0         0         0
    TXOP Exceeded:               0          0         0         0

So I don’t see TXOP exceeded, but I do see exorbitant numbers
of retries in the AMPDU VI queue for some reason. This disturbs me.

In xmit.c I see:

  static u32 ath_lookup_rate(struct ath_softc *sc, struct ath_buf *bf,
struct ath_atx_tid *tid)
{
  ...
  /*
* Find the lowest frame length among the rate series that will have a
* 4ms transmit duration.
* TODO - TXOP limit needs to be considered.
*/
max_4ms_framelen = ATH_AMPDU_LIMIT_MAX;

History

Updated by Dave Täht on Jul 8, 2012.
Andrew McGregor: You’d be right to call that dubious
12:24 AM Either the limit has to be the minimum amongst all the queues, or it has to be per queue
me: Perhaps I’m still misunderstanding - at MCS4 what is the max aggregate?
1:57 AM Andrew McGregor: I don’t know, it’s expressed in units of time
1:58 AM It is prescribed by the queue’s parameters…
1:59 AM There’s a table here: http://wifi-insider.com/wlan/wmm.htm 2:01 AM So, the way you’d implement that is to query the driver for the transmit time for the AMPDU as you’re building it up
Updated by Dave Täht on Jul 11, 2012.
hey dtaht [20:20]
i’ve been looking into that VI
queue txop mess some more
looks like i found an
interesting bug (in addition to
the stuff you already uncovered)
[20:21]
seems that the configured txop
limit in the hw is off by a
factor 32 ;)
if i’m reading this stuff
correctly
only affects VI and VO of course
\ [22:00]
it would limit the maximum transmission duration for VI to 94 usec and
VO to 47 [22:02]
maybe the hw adds some extra time on top of that
my patches seem to work ;) [22:09]
i didn’t test pushing traffic to the VI queue
but i tested adjusting the BE queue to the same txop limit
committed [22:11]
have fun with that, i’m going to get some sleep now [22:12]
i’ll submit this stuff to linux-wireless@ tomorrow [22:13]
Updated by Dave Täht on Jul 12, 2012.
Well, I see better behavior in the VI queue, using

[1] Done netperf -l 60 -Y CS0,CS0 -H 172.20.1.1
[2]- Done netperf -l 60 -Y CS5,CS5 -H 172.20.1.1
[3]+ Done netperf -l 60 -Y EF,EF -H 172.20.1.1

The BK queue does indeed drop packets according to tc, VO, VI weren’t

Before I got that far, this:

                            BE         BK        VI        VO

MPDUs Queued:             1308        161      1364     39399
MPDUs Completed:          1308        161      1364     39397
MPDUs XRetried:              0          0         0         2
Aggregates:               2975        962      9326         0
AMPDUs Queued HW:        14134       4331    282818         0
AMPDUs Queued SW:         6773       2314     31996         0
AMPDUs Completed:        19780       6470    313444         0
AMPDUs Retried:          34959       4825     84898         0
AMPDUs XRetried:          1126        175      1370         0
FIFO Underrun:               0          0         0         0
TXOP Exceeded:               0          0         0         0
TXTIMER Expiry:              0          0         0         0
DESC CFG Error:              0          0         0         0
DATA Underrun:               0          0         0         0
DELIM Underrun:              0          0         0         0
TX-Pkts-All:             22214       6806    316178     39399
TX-Bytes-All:          2431000     602745  27746600   3522718
hw-put-tx-buf:               1          1         1         1
hw-tx-start:             53190      10240    390746     39399
hw-tx-proc-desc:         53189      10240    390746     39399
TX-Failed:                   0          0         0         0
txq-memory-address:   8280a1b4   8280a230  8280a138  8280a0bc
axq-qnum:                    2          3         1         0
axq-depth:                   1          0         0         0
axq-ampdu_depth:             1          0         0         0
axq-stopped                  0          0         0         0
tx-in-progress               0          0         0         0
pending-frames               1          0         0         0
txq_headidx:                 0          0         0         0
txq_tailidx:                 0          0         0         0
axq_q empty:                   0          0         0         0
axq_acq empty:                 1          1         1         1
txq_fifo[0] empty:             1          1         1         1
txq_fifo[1] empty:             1          1         1         1
txq_fifo[2] empty:             1          1         1         1
txq_fifo[3] empty:             1          1         1         1
txq_fifo[4] empty:             1          1         1         1
txq_fifo[5] empty:             1          1         1         1
txq_fifo[6] empty:             1          1         1         1
txq_fifo[7] empty:             1          1         1         1

However doing stuff in the reverse direction crashes the router.
This could be an out of memory condition or something else with fq_codel or the driver…

Using netperf 2.6 - from my laptop, connected via mesh via ad-hoc mode, at rates around 120Mbit…

netperf -l 60 -Y EF,EF -H 172.20.1.1 -t TCP_MAERTS &
netperf -l 60 -Y CS5,CS5 -H 172.20.1.1 -t TCP_MAERTS &
netperf -l 60 -Y CS0,CS0 -H 172.20.1.1 -t TCP_MAERTS &

This thoroughly exercises the VO,VI,BE queues (which is not something that happens in real life)

Updated by David Taht on Apr 11, 2013.

This is a static export of the original bufferbloat.net issue database. As such, no further commenting is possible; the information is solely here for archival purposes.
RSS feed

Recent Updates

Jul 21, 2024 Wiki page
cake-autorate
Jul 21, 2024 Wiki page
What Can I Do About Bufferbloat?
Jul 21, 2024 Wiki page
Tests for Bufferbloat
Jul 1, 2024 Wiki page
RRUL Chart Explanation
Dec 3, 2022 Wiki page
Codel Wiki

Find us elsewhere

Bufferbloat Mailing Lists
#bufferbloat on Twitter
Google+ group
Archived Bufferbloat pages from the Wayback Machine

Sponsors

Comcast Research Innovation Fund
Nlnet Foundation
Shuttleworth Foundation
GoFundMe

Bufferbloat Related Projects

OpenWrt Project
Congestion Control Blog
Flent Network Test Suite
Sqm-Scripts
The Cake shaper
AQMs in BSD
IETF AQM WG
CeroWrt (where it all started)

Network Performance Related Resources


Jim Gettys' Blog - The chairman of the Fjord
Toke's Blog - Karlstad University's work on bloat
Voip Users Conference - Weekly Videoconference mostly about voip
Candelatech - A wifi testing company that "gets it".