I am really hard pressed to think of circumstances where seeing duplicate pings can happen, given the various forms of ARQ and block acks built into 802.11n. The fact that it was happening on an idle (2 hop) network implies (to me) that single packet transfers may have an issue in retries that multipacket transfers do not.
Things that tweak me on this topic were the recent sack related analysis and dan’s work on coverfire
http://www.coverfire.com/archives/2011/10/15/linux-flow-classifier-proto-dst-and-tos/
which pointed to oddball packets being mishandled elsewhere in the stack. The experiment was otherwise successful, showing a distinct improvement in latency when a txqueuelen of 37 was used vs a txqueuelen of 1000, on the router, and for all I know there ARE circumstances in 802.11n where you will see duplicate packets… but interestingly, no dups were seen in the txqueuelen 1000 test.
Further experiments are needed.
d@cruithne:~/org/lincstalk/bbloat\$ grep -A 1 -B 1 DUP p1.txt # this was the txqueuelen 37 test:
64 bytes from 172.26.0.3: icmp_seq=67 ttl=63 time=1.237 ms
64 bytes from 172.26.0.3: icmp_seq=67 ttl=63 time=67.926 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=68 ttl=63 time=1.503 ms
–
64 bytes from 172.26.0.3: icmp_seq=104 ttl=63 time=2.624 ms
64 bytes from 172.26.0.3: icmp_seq=104 ttl=63 time=108.333 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=105 ttl=63 time=14.603 ms
–
64 bytes from 172.26.0.3: icmp_seq=115 ttl=63 time=1.543 ms
64 bytes from 172.26.0.3: icmp_seq=115 ttl=63 time=72.641 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=116 ttl=63 time=60.502 ms
–
64 bytes from 172.26.0.3: icmp_seq=194 ttl=63 time=1.948 ms
64 bytes from 172.26.0.3: icmp_seq=194 ttl=63 time=169.254 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=195 ttl=63 time=1.486 ms
–
64 bytes from 172.26.0.3: icmp_seq=229 ttl=63 time=1.585 ms
64 bytes from 172.26.0.3: icmp_seq=229 ttl=63 time=75.449 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=230 ttl=63 time=3.115 ms
–
64 bytes from 172.26.0.3: icmp_seq=259 ttl=63 time=1.574 ms
64 bytes from 172.26.0.3: icmp_seq=259 ttl=63 time=77.107 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=260 ttl=63 time=1.253 ms
–
64 bytes from 172.26.0.3: icmp_seq=376 ttl=63 time=1.761 ms
64 bytes from 172.26.0.3: icmp_seq=376 ttl=63 time=78.557 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=377 ttl=63 time=1.333 ms
–
64 bytes from 172.26.0.3: icmp_seq=497 ttl=63 time=1.681 ms
64 bytes from 172.26.0.3: icmp_seq=497 ttl=63 time=72.170 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=498 ttl=63 time=5.078 ms
txqueuelen 37 REALLY works vs 1000 - and the real sources of the
bottlenecks have moved to the clients, not the AP. On to testing queue
management!
Is 37 the right number? don’t know. Is it a better number than 1000 -
you betcha!
Andrew, are you still thinking 100 is better for google’s iw10 mice attack?
first dup is packet 67
second dup is packet 104.
104-67 = 37, which is my txqueuelen.
Subject: Re: plot
Date: Fri, 21 Oct 2011 16:25:16 +0200
From: Fabian Schneider fabian@ieee.org
To: David Täht dave.taht@gmail.com
CC: Jim Gettys jg@freedesktop.org, Kathleen Nichols
nichols@pollere.com
Hi,
Ok. lets see. I created something that should ease up the plotting. See attached tar archive and sample plot. To summarize there is a Makefile which plots the data and expects:
- p1.txt and p2.txt as input (which should be the logs from the ping command)
- awk and sed to be installed and in the path
- GNU R to be installed and in the path
What it does is:
1.) extracts the RTT timeseries ‘hopefully’ covering the download period, by looking for the ping sequence number (SEQ) with maximum RTT (MAX) and considering all RTTs from SEQ-100 to SEQ+100.
2.) replacing all timeouted pings with an RTT value of MAX+100ms, where the 100ms can be configured in the BEGIN clause of the awk script.
3.) plot the two time series.
What it includes:
- extract.timeseries.awk (performs steps 1+2)
- plotit.R (performs step 3)
- Makefile (wrapper)
- p*.txt example input files
(for the combined plot i determine the maximum RTT and use that as an upper bound)
(i opted for a 200 second = 200 ping samples period instead)
already done. Only one output plot.
you can ‘tune’ the input data and have the input file start at a chosen point in time that is less than 100 seconds before the maximum to achieve this.
(I did not want to get into more sophisticated methods for determining the download period from the data, sofar)
No what we saw is ping reporting “(DUP!)” behind some lines.
sure.
best
Fabian