Bug #288

Duplicate pings bother me

Added by Dave Täht on Oct 21, 2011. Updated on Apr 21, 2012.
Closed Normal David Taht

Description

In a recent experiment with cerowrt, some duplicate pings were seen, largely when the network was otherwise idle.

I am really hard pressed to think of circumstances where seeing duplicate pings can happen, given the various forms of ARQ and block acks built into 802.11n. The fact that it was happening on an idle (2 hop) network implies (to me) that single packet transfers may have an issue in retries that multipacket transfers do not.

Things that tweak me on this topic were the recent sack related analysis and dan’s work on coverfire

http://www.coverfire.com/archives/2011/10/15/linux-flow-classifier-proto-dst-and-tos/

which pointed to oddball packets being mishandled elsewhere in the stack. The experiment was otherwise successful, showing a distinct improvement in latency when a txqueuelen of 37 was used vs a txqueuelen of 1000, on the router, and for all I know there ARE circumstances in 802.11n where you will see duplicate packets… but interestingly, no dups were seen in the txqueuelen 1000 test.

Further experiments are needed.

d@cruithne:~/org/lincstalk/bbloat\$ grep -A 1 -B 1 DUP p1.txt # this was the txqueuelen 37 test:

64 bytes from 172.26.0.3: icmp_seq=67 ttl=63 time=1.237 ms
64 bytes from 172.26.0.3: icmp_seq=67 ttl=63 time=67.926 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=68 ttl=63 time=1.503 ms

64 bytes from 172.26.0.3: icmp_seq=104 ttl=63 time=2.624 ms
64 bytes from 172.26.0.3: icmp_seq=104 ttl=63 time=108.333 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=105 ttl=63 time=14.603 ms

64 bytes from 172.26.0.3: icmp_seq=115 ttl=63 time=1.543 ms
64 bytes from 172.26.0.3: icmp_seq=115 ttl=63 time=72.641 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=116 ttl=63 time=60.502 ms

64 bytes from 172.26.0.3: icmp_seq=194 ttl=63 time=1.948 ms
64 bytes from 172.26.0.3: icmp_seq=194 ttl=63 time=169.254 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=195 ttl=63 time=1.486 ms

64 bytes from 172.26.0.3: icmp_seq=229 ttl=63 time=1.585 ms
64 bytes from 172.26.0.3: icmp_seq=229 ttl=63 time=75.449 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=230 ttl=63 time=3.115 ms

64 bytes from 172.26.0.3: icmp_seq=259 ttl=63 time=1.574 ms
64 bytes from 172.26.0.3: icmp_seq=259 ttl=63 time=77.107 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=260 ttl=63 time=1.253 ms

64 bytes from 172.26.0.3: icmp_seq=376 ttl=63 time=1.761 ms
64 bytes from 172.26.0.3: icmp_seq=376 ttl=63 time=78.557 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=377 ttl=63 time=1.333 ms

64 bytes from 172.26.0.3: icmp_seq=497 ttl=63 time=1.681 ms
64 bytes from 172.26.0.3: icmp_seq=497 ttl=63 time=72.170 ms (DUP!)
64 bytes from 172.26.0.3: icmp_seq=498 ttl=63 time=5.078 ms

Attachments

  • plot.ps (image/ps; 24.5 kiB) David Taht Oct 21, 2011

History

Updated by David Taht on Oct 21, 2011.
——– Original Message ——–
Subject: Re: plot
Date: Fri, 21 Oct 2011 16:25:16 +0200
From: Fabian Schneider fabian@ieee.org
To: David Täht dave.taht@gmail.com
CC: Jim Gettys jg@freedesktop.org, Kathleen Nichols
nichols@pollere.com

Hi,

Obviously I need to learn enough R to be able to do this sort of analysis myself. In the interim, since I’m doing a similar experiment monday (this time varying either classification of ping or the queue-ing algorithm, not the buffer size - I haven’t decided which), we’d love to see your script in the hope I can immediately use it in the class…

Ok. lets see. I created something that should ease up the plotting. See attached tar archive and sample plot. To summarize there is a Makefile which plots the data and expects:
- p1.txt and p2.txt as input (which should be the logs from the ping command)
- awk and sed to be installed and in the path
- GNU R to be installed and in the path

What it does is:
1.) extracts the RTT timeseries ‘hopefully’ covering the download period, by looking for the ping sequence number (SEQ) with maximum RTT (MAX) and considering all RTTs from SEQ-100 to SEQ+100.
2.) replacing all timeouted pings with an RTT value of MAX+100ms, where the 100ms can be configured in the BEGIN clause of the awk script.
3.) plot the two time series.

What it includes:
- extract.timeseries.awk (performs steps 1+2)
- plotit.R (performs step 3)
- Makefile (wrapper)
- p*.txt example input files

As for these plots, what I’d like:

the Y axis to stay in the the same range of 0 - 1000 ms

(for the combined plot i determine the maximum RTT and use that as an upper bound)

the X axis to be 100 seconds long, showing the idle period, the spike, the idle period.

(i opted for a 200 second = 200 ping samples period instead)

the two plots to be directly comparable, using a different color for the ping ‘dots’, and roughly the same start time, so I can overlay them…

already done. Only one output plot.

1) start time is uncontrolled (but close enough, given the length of transfer)

you can ‘tune’ the input data and have the input file start at a chosen point in time that is less than 100 seconds before the maximum to achieve this.
(I did not want to get into more sophisticated methods for determining the download period from the data, sofar)

4) tcpdump on at least several stations would be VERY interesting. Dario says you were seeing some interesting duplicate acks - I still haven’t verified we’ve actually FIXED the stack enough to get reliable results.

No what we saw is ping reporting “(DUP!)” behind some lines.

Would you be interested in following up this line of work with a paper w/me/etc?

sure.

best
Fabian

Updated by David Taht on Oct 21, 2011.
I just wanted to express how (other than the dup ping issue) utterly
happy I am with the attached txqueuelen37 vs 1000 plot.

txqueuelen 37 REALLY works vs 1000 - and the real sources of the
bottlenecks have moved to the clients, not the AP. On to testing queue
management!

Is 37 the right number? don’t know. Is it a better number than 1000 -
you betcha!

Andrew, are you still thinking 100 is better for google’s iw10 mice attack?

Updated by Dave Täht on Oct 21, 2011.
oh, I forgot to mention what REALLY tweaked me (athough it’s not shared by the remainder of the data set and is probably totallllly a matter of mere misfortune) and made me jump at shadows like this one.

first dup is packet 67
second dup is packet 104.

104-67 = 37, which is my txqueuelen.

Updated by Dave Täht on Nov 18, 2011.
Can’t reproduce at current txqueuelen
Updated by Jim Gettys on Nov 18, 2011.
Seemed fixed. TXquelen == 40 seems to have fixed it….. Bizarre.
Updated by Dave Täht on Apr 21, 2012.

This is a static export of the original bufferbloat.net issue database. As such, no further commenting is possible; the information is solely here for archival purposes.
RSS feed

Recent Updates

Oct 20, 2023 Wiki page
What Can I Do About Bufferbloat?
Dec 3, 2022 Wiki page
Codel Wiki
Jun 11, 2022 Wiki page
More about Bufferbloat
Jun 11, 2022 Wiki page
Tests for Bufferbloat
Dec 7, 2021 Wiki page
Getting SQM Running Right

Find us elsewhere

Bufferbloat Mailing Lists
#bufferbloat on Twitter
Google+ group
Archived Bufferbloat pages from the Wayback Machine

Sponsors

Comcast Research Innovation Fund
Nlnet Foundation
Shuttleworth Foundation
GoFundMe

Bufferbloat Related Projects

OpenWrt Project
Congestion Control Blog
Flent Network Test Suite
Sqm-Scripts
The Cake shaper
AQMs in BSD
IETF AQM WG
CeroWrt (where it all started)

Network Performance Related Resources


Jim Gettys' Blog - The chairman of the Fjord
Toke's Blog - Karlstad University's work on bloat
Voip Users Conference - Weekly Videoconference mostly about voip
Candelatech - A wifi testing company that "gets it".