Bug #340

massive packet loss on wireless crypto in 3.3-rc4-3

Added by David Taht on Feb 29, 2012. Updated on Aug 2, 2012.
Closed Normal Dave Täht

Description

I am getting 80% packet loss on my attempt to get a 3.3-rc4-3 working at all
on encrypted wireless. the icmpv6 errors are another problem entirely
I’d expect.

[ 72.040000] ADDRCONF (NETDEV_CHANGE): gw11: link becomes ready
[ 318.320000] icmpv6_send: no reply to icmp error
[ 318.320000] icmpv6_send: no reply to icmp error
[ 650.040000] icmpv6_send: no reply to icmp error
[ 650.040000] icmpv6_send: no reply to icmp error
[ 1291.750000] ath: Failed to stop TX DMA, queues=0x004!
[ 1295.080000] ath: Failed to stop TX DMA, queues=0x00c!
[ 1809.500000] icmpv6_send: no reply to icmp error
[ 1923.970000] icmpv6_send: no reply to icmp error
[ 1923.970000] icmpv6_send: no reply to icmp error
[ 1923.970000] icmpv6_send: no reply to icmp error
root@quintaped:/etc/config#

History

Updated by Hector Ordorica on Feb 29, 2012.
Perhaps related, but I noticed this change recently in openwrt trunk for ICMPv6:

https://dev.openwrt.org/changeset/30727/trunk

In wireless testing in trunk, there was also a patch to ath9k that reduced tx bufs to 32 I think. I can’t seem to find it at the moment though:

https://dev.openwrt.org/changeset/30746/trunk

Updated by Dave Täht on Feb 29, 2012.
yea, I wondered where the txqueuelen change came from…

I’ve gone way beyond that (running sfq by default)… but hmmm…… I’ll try ripping that out

as for the icmp change, I tried stripping that out to no real effect…

  1. Allow essential incoming IPv6 ICMP traffic
    config rule
    option name Allow-ICMPv6-Input
    option src *
    option proto icmp
    option limit 1000/sec
    option family ipv6
    option target ACCEPT
  1. Allow essential forwarded IPv6 ICMP traffic
    config rule
    option name Allow-ICMPv6-Forward
    option src *
  2. option dest *
    option proto icmp
    option limit 1000/sec
    option family ipv6
    option target ACCEPT
Updated by Dave Täht on Feb 29, 2012.
For reference I am way beyond fiddling with txqueuelen

root@quintaped:/etc/config# tc -s qdisc show dev sw10
qdisc mq 1: root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc sfq 10: parent 1:1 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 20: parent 1:2 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 30: parent 1:3 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 40: parent 1:4 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0

Updated by Dave Täht on Mar 1, 2012.
This is actually against the rc4-3 release, not the rc5

I am aware that powersave mode does weird things to ping, but the phone does not access the internet worth beans and pinging it has results like:

root@quintaped:/etc/dibbler# ping 172.29.42.85
PING 172.29.42.85 (172.29.42.85): 56 data bytes
64 bytes from 172.29.42.85: seq=0 ttl=64 time=7198.999 ms
64 bytes from 172.29.42.85: seq=1 ttl=64 time=6506.462 ms
64 bytes from 172.29.42.85: seq=2 ttl=64 time=5700.414 ms
64 bytes from 172.29.42.85: seq=3 ttl=64 time=4838.906 ms
64 bytes from 172.29.42.85: seq=4 ttl=64 time=10448.104 ms
64 bytes from 172.29.42.85: seq=5 ttl=64 time=12287.972 ms
64 bytes from 172.29.42.85: seq=6 ttl=64 time=12414.146 ms
64 bytes from 172.29.42.85: seq=11 ttl=64 time=10277.950 ms
64 bytes from 172.29.42.85: seq=24 ttl=64 time=17193.064 ms
64 bytes from 172.29.42.85: seq=31 ttl=64 time=10291.945 ms
64 bytes from 172.29.42.85: seq=32 ttl=64 time=12313.409 ms
64 bytes from 172.29.42.85: seq=33 ttl=64 time=12334.132 ms
64 bytes from 172.29.42.85: seq=34 ttl=64 time=14613.955 ms
64 bytes from 172.29.42.85: seq=35 ttl=64 time=14074.534 ms
64 bytes from 172.29.42.85: seq=41 ttl=64 time=11444.921 ms
64 bytes from 172.29.42.85: seq=42 ttl=64 time=11989.482 ms
64 bytes from 172.29.42.85: seq=43 ttl=64 time=11527.725 ms
64 bytes from 172.29.42.85: seq=44 ttl=64 time=13893.153 ms
64 bytes from 172.29.42.85: seq=45 ttl=64 time=14567.878 ms
64 bytes from 172.29.42.85: seq=51 ttl=64 time=11443.464 ms
64 bytes from 172.29.42.85: seq=52 ttl=64 time=10613.561 ms
64 bytes from 172.29.42.85: seq=53 ttl=64 time=9648.801 ms
64 bytes from 172.29.42.85: seq=54 ttl=64 time=8652.338 ms
64 bytes from 172.29.42.85: seq=55 ttl=64 time=10004.857 ms
64 bytes from 172.29.42.85: seq=56 ttl=64 time=9575.878 ms
64 bytes from 172.29.42.85: seq=57 ttl=64 time=9810.736 ms
64 bytes from 172.29.42.85: seq=58 ttl=64 time=8888.394 ms
64 bytes from 172.29.42.85: seq=59 ttl=64 time=10075.711 ms
64 bytes from 172.29.42.85: seq=60 ttl=64 time=11003.248 ms
64 bytes from 172.29.42.85: seq=61 ttl=64 time=13171.309 ms
64 bytes from 172.29.42.85: seq=62 ttl=64 time=13581.010 ms
64 bytes from 172.29.42.85: seq=63 ttl=64 time=16758.824 ms
64 bytes from 172.29.42.85: seq=71 ttl=64 time=9913.159 ms
64 bytes from 172.29.42.85: seq=72 ttl=64 time=9179.238 ms
64 bytes from 172.29.42.85: seq=73 ttl=64 time=8977.600 ms
64 bytes from 172.29.42.85: seq=74 ttl=64 time=8550.328 ms
64 bytes from 172.29.42.85: seq=75 ttl=64 time=8268.744 ms
64 bytes from 172.29.42.85: seq=76 ttl=64 time=8868.098 ms
64 bytes from 172.29.42.85: seq=77 ttl=64 time=8484.207 ms
64 bytes from 172.29.42.85: seq=78 ttl=64 time=7750.523 ms
64 bytes from 172.29.42.85: seq=79 ttl=64 time=6929.604 ms
64 bytes from 172.29.42.85: seq=80 ttl=64 time=6032.321 ms
64 bytes from 172.29.42.85: seq=81 ttl=64 time=5195.933 ms
64 bytes from 172.29.42.85: seq=82 ttl=64 time=4568.045 ms
64 bytes from 172.29.42.85: seq=83 ttl=64 time=5821.851 ms

Updated by Dave Täht on Mar 1, 2012.
Similarly, I see this in the logs against this rc4-3

[35035.940000] ath: Failed to stop TX DMA, queues=0x004!
[38649.010000] ath: Failed to stop TX DMA, queues=0x004!
[39778.210000] ath: Failed to stop TX DMA, queues=0x004!
[41006.390000] ath: Failed to stop TX DMA, queues=0x004!
[41288.800000] ath: Failed to stop TX DMA, queues=0x004!
[42141.720000] ath: Failed to stop TX DMA, queues=0x004!
[45895.820000] ath: Failed to stop TX DMA, queues=0x004!
[53289.030000] ath: Failed to stop TX DMA, queues=0x004!
[53858.920000] ath: Failed to stop TX DMA, queues=0x001!

Updated by Dave Täht on Mar 1, 2012.
Mar 1 09:58:57 quintaped daemon.info dnsmasq-dhcp[3888]: DHCPACK (sw10) 192.168.1.102 00:27:13:64:89:67 cruithne
Mar 1 09:58:57 quintaped daemon.info hostapd: sw00: STA 00:13:e8:58:2a:c5 WPA: group key handshake completed (RSN)
Mar 1 09:58:58 quintaped daemon.info hostapd: sw00: STA 90:21:55:67:8d:cb WPA: group key handshake completed (RSN)
Mar 1 09:58:58 quintaped daemon.info hostapd: sw00: STA 90:21:55:67:8d:cb WPA: received EAPOL-Key 22 Group with unexpected replay counter
Updated by Dave Täht on Mar 1, 2012.
Well, the good news is 3.3rc5-1 is not exhibiting the problem when at 2.4ghz,with crypto.
pings are respectable, performance on the phone itself is good…

The bad news is that this is a highly simplified test setup rather than an attempted deployment,
and I’m not going to have time to attempt recreating the deployment.

64 bytes from 172.30.42.85: seq=46 ttl=64 time=137.776 ms
64 bytes from 172.30.42.85: seq=47 ttl=64 time=160.643 ms
64 bytes from 172.30.42.85: seq=48 ttl=64 time=184.652 ms
64 bytes from 172.30.42.85: seq=49 ttl=64 time=3.000 ms
64 bytes from 172.30.42.85: seq=50 ttl=64 time=64.671 ms
64 bytes from 172.30.42.85: seq=51 ttl=64 time=50.848 ms
64 bytes from 172.30.42.85: seq=52 ttl=64 time=4.502 ms
64 bytes from 172.30.42.85: seq=53 ttl=64 time=99.418 ms
64 bytes from 172.30.42.85: seq=54 ttl=64 time=122.181 ms

Updated by Dave Täht on Mar 7, 2012.
as far as I know this was only a bug in 3.3rc4.
Updated by Dave Täht on Apr 21, 2012.

This is a static export of the original bufferbloat.net issue database. As such, no further commenting is possible; the information is solely here for archival purposes.
RSS feed

Recent Updates

Jul 21, 2024 Wiki page
cake-autorate
Jul 21, 2024 Wiki page
What Can I Do About Bufferbloat?
Jul 21, 2024 Wiki page
Tests for Bufferbloat
Jul 1, 2024 Wiki page
RRUL Chart Explanation
Dec 3, 2022 Wiki page
Codel Wiki

Find us elsewhere

Bufferbloat Mailing Lists
#bufferbloat on Twitter
Google+ group
Archived Bufferbloat pages from the Wayback Machine

Sponsors

Comcast Research Innovation Fund
Nlnet Foundation
Shuttleworth Foundation
GoFundMe

Bufferbloat Related Projects

OpenWrt Project
Congestion Control Blog
Flent Network Test Suite
Sqm-Scripts
The Cake shaper
AQMs in BSD
IETF AQM WG
CeroWrt (where it all started)

Network Performance Related Resources


Jim Gettys' Blog - The chairman of the Fjord
Toke's Blog - Karlstad University's work on bloat
Voip Users Conference - Weekly Videoconference mostly about voip
Candelatech - A wifi testing company that "gets it".