this is trapping on unaligned accesses. from somewhere.
I was going to try using netperf, instead, to see if it was a kernel problem rather than a userspace problem.
oprofile is enabled in the current build and installable as a package. I forgot to enable the unaligned trap perf monitor, however in the rc1 build.
(This is not a blocker for rc1 of cerowrt). The router runs at wire speeds at 100Mbit, even with only 4 buffers in the ethernet driver (however there’s a problem with setting that that low, via ethtool, at gigE speeds, that locks up the port entirely. I’ll file a bug on that shortly, with how to duplicate that exactly, see also the email on the lists)
ALSO:
Lots of iptables rules, using multiport matches (code in my Diffserv repo) also slowed the router down far more than expected. The new firewall code doesn’t use multiport matches (although they could, I think, or I have the new syntax wrong), and I haven’t had time to verify what happens with lots of ports in the latest cerowrt builds (I am opening more ports than usual, so that testing tools can abuse more stuff, like rsync)
I again failed to include the unaligned trap perf counter in rc4 however.
packed together
were showing up with high percentage points [08:03]
bit access
[08:04]
build environent, or what? [08:05]
sane, of all this stuff, so I don’t have to replicate the bits flying
in loose formation?
[08:08]
report 216 - of where you are at. Hopefully the next train I get on
will be less laggy.
effect that this stuff can have
devices that have both crappy ethernet chips with alignment limitations
and inefficient unaligned access
times for a couple arches, and it bugs me it’s not in there…
[08:32]
there. Done that. :) Didn’t realize it was also doing that many other
places that were busted….
amazing…. Then the guy that wrote the original mac80211
surfaced… And nbd is back and restored to life, after chasing girls
for weeks in Indonesia… :) [08:35]
fix): [08:41]
be really impressed with this next pass
suppose.
andrew?
I built a version of cerowrt from openwrt head, in andrew’s dir with
netperf
installed by default…
Going with the defaults:
netperf -H the_other_router from_one_router
I get 143Mbit
Going router-router over gigE (lan to wan), with tcp_low_latency
turned on,
on both sides, with:
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10\^6bits/sec
87380 16384 16384 60.00 218.20
so tcp_low_latency looks like a win with westwood+
My laptop doesn’t do gigE, so can’t test that here.
All these tests are with the default firewall rules for cerowrt, and
nat
turned off on the one master router… and ethernet buffers of 4,
and
txqueuelen of 8.
For giggles, I did a test with cubic:
root@OpenWrt:/proc/sys/net/ipv4# fg
netperf -l 60 -H 172.30.42.33
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10\^6bits/sec
87380 16384 16384 60.00 228.26
And last, with cubic, and firewall rules turned off:
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
gw.home.lan (172.30.42.33) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10\^6bits/sec
87380 16384 16384 60.00 257.75
I’ll try diffserv again in the morning.
the udp_stream test:
64 bytes from 172.31.42.33: icmp_req=525 ttl=64 time=41.0 ms
64 bytes from 172.31.42.33: icmp_req=526 ttl=64 time=37.3 ms
64 bytes from 172.31.42.33: icmp_req=527 ttl=64 time=33.0 ms
and crashes and burns with a l 120 never ending
with -l 10
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.30.42.33 (172.30.42.33) port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10\^6bits/sec
126976 65507 10.01 1842 0 96.47
112640 10.01 0 0.00
I pulled from 27917.
I don’t see any commits that could have caused this breakage, except
maybe
that I’m building against 2.6.39.3, and may have differences in my
config
file from yours that are important.
could I encourage you to hop on huchra?
Also I’m curious if YOUR ethernet build showed this:
T
root@OpenWrt:~# dmesg | grep eth0
eth0: Atheros AG71xx at 0xb9000000, irq 4
eth0: unable to find MII bus on device ‘rtl8366s’
eth0: Atheros AG71xx at 0xba000000, irq 5
eth0: unable to find MII bus on device ‘rtl8366s’
I thought I was building from the same .config as you with support for
the
1000hz and no_hz options - the above looks like a faster clock tick
problem
to me… but you should have seen that too.
So I can revert to 2.6.39.2, revert all that
and/or restart from scratch from my patch set (Which only touches one
tiny
bit of the eth driver and NOTHING of the wireless driver) But with a
little
data perhaps that won’t be neccessary.
I’ll stick around here a while longer, cleaning up, and getting a
router
from thursday online…
After I get some sleep.
On Fri, Aug 5, 2011 at 7:00 PM, Andrew McGregor andrewmcgr@gmail.comwrote:
Huh? I generated it based on openwrt head r27912
On 5/08/2011, at 5:01 PM, Dave Taht wrote:
so this patch duplicates some, but not all, of what is now in openwrt head?
It only partially applies.
Ripping it out again, and going off to hopefully find the problem in the
ethernet driver instead….
On Fri, Aug 5, 2011 at 4:00 PM, Dave Taht dave.taht@gmail.com wrote:
>
>
> ———- Forwarded message ———-
> From: Andrew McGregor andrewmcgr@gmail.com
> Date: Fri, Aug 5, 2011 at 3:57 PM
> Subject: Resend of patch
> To: Dave Taht dave.taht@gmail.com
>
>
> Ok, this patch applies on top of Felix’ last set of changes, and produces
> really spectacular results: about 2x better latency on 11n than before we
> both started on it at basically no performance cost, and fairly decent and
> consistent latency on 11g at a performance cost so small I can’t measure it.
>
> It may be worth fiddling a little more with the tunables, but this is
> pretty good for now.
>
> (btw: drop this file in the packages/mac80211 directory)
>
>
>
>
———- Forwarded message ———-
From: Andrew McGregor andrewmcgr@gmail.com
Date: Fri, Aug 5, 2011 at 8:59 PM
Subject: Re: Resend of patch
To: Dave Taht dave.taht@gmail.com
I don’t know if ethernet was working, I never checked.
I was using 2.6.39.2 however, so I suspect an upstream change is causing
the
patch to not apply. It’s not big, you could put it in by hand easily
enough. The last chunk you could also change the other case in that if
statement to something smaller, although that won’t happen on a WNDR3700
About to get on a plane to NZ, so I’ll be out of circulation for around
36
hours.
On 5/08/2011, at 6:25 PM, Dave Taht wrote:
I pulled from 27917.
I don’t see any commits that could have caused this breakage, except
maybe
that I’m building against 2.6.39.3, and may have differences in my
config
file from yours that are important.
could I encourage you to hop on huchra?
Also I’m curious if YOUR ethernet build showed this:
T
root@OpenWrt:~# dmesg | grep eth0
eth0: Atheros AG71xx at 0xb9000000, irq 4
eth0: unable to find MII bus on device ‘rtl8366s’
eth0: Atheros AG71xx at 0xba000000, irq 5
eth0: unable to find MII bus on device ‘rtl8366s’
I thought I was building from the same .config as you with support for
the
1000hz and no_hz options - the above looks like a faster clock tick
problem
to me… but you should have seen that too.
So I can revert to 2.6.39.2, revert all that
and/or restart from scratch from my patch set (Which only touches one
tiny
bit of the eth driver and NOTHING of the wireless driver) But with a
little
data perhaps that won’t be neccessary.
I’ll stick around here a while longer, cleaning up, and getting a
router
from thursday online…
After I get some sleep.
On Fri, Aug 5, 2011 at 7:00 PM, Andrew McGregor
andrewmcgr@gmail.comwrote:
Huh? I generated it based on openwrt head r27912
On 5/08/2011, at 5:01 PM, Dave Taht wrote:
so this patch duplicates some, but not all, of what is now in openwrt head?
It only partially applies.
Ripping it out again, and going off to hopefully find the problem in the
ethernet driver instead….
On Fri, Aug 5, 2011 at 4:00 PM, Dave Taht dave.taht@gmail.com wrote:
>
>
> ———- Forwarded message ———-
> From: Andrew McGregor andrewmcgr@gmail.com
> Date: Fri, Aug 5, 2011 at 3:57 PM
> Subject: Resend of patch
> To: Dave Taht dave.taht@gmail.com
>
>
> Ok, this patch applies on top of Felix’ last set of changes, and produces
> really spectacular results: about 2x better latency on 11n than before we
> both started on it at basically no performance cost, and fairly decent and
> consistent latency on 11g at a performance cost so small I can’t measure it.
>
> It may be worth fiddling a little more with the tunables, but this is
> pretty good for now.
>
> (btw: drop this file in the packages/mac80211 directory)
>
>
>
>
http://huchra.bufferbloat.net/~andrewm/cerowrt/
I hope this build finishes up the last of the truly epic adventure that
has
taken place on bug:
http://www.bufferbloat.net/issues/216
If it works, do play with netperf 2.5.0 on the wireless interface.
I note that the fixes to #195 seem to have (maybe) messed up the
ethernet
interface - or the change to a faster clock or tickless or 2.6.39.3 vs
.2…. OR It may be the router I have on me!
So let us know on this email (bug reporting interface is cc’d)
That’s all the news from tent #34. Good night, and good luck.
I shan’t document what I’ll do next as it’s pretty obvious the bisecting I’d need to do, nor, do I know, what the real effect of a 1000HZ vs 100HZ clock is on a tickless system, anymore.
100HZ clock + tickless + 2.6.39.3… just works.
I have a backlog of other patches that needed to land and a ton of src to clean up, but we’re back in business.
“With the right eyeballs, all bugs are shallow”
Is there an OpenWRT build containing the patch that I can try?