Discussion:
Reliability issues with BPF
(too old to reply)
Darren Reed
2012-07-19 16:27:46 UTC
Permalink
In doing some testing on NetBSD, I'm discovering
that BPF and tcpdump is not 100% reliable when it
comes to capturing packets. What do I mean by that?
When ^C (or SIGINT) is sent to tcpdump, packets
that it ought to have captured simply aren't.

For example, if I start tcpdump in the background
and then run an ipv6 ping generating 2000 byte
packets with a command like "ping6 -nc3 -s2000 fec0::1",
the ping ends successfully but terminating the
tcpdump may show as few as 8 packets rather than
12. 3 packets going in each direction (echo plus
echo reply) makes 6, doubled for fragments gives
12. I can't for the life of me think why this
should be.

Clues anyone?

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-07-19 17:12:22 UTC
Permalink
Post by Darren Reed
For example, if I start tcpdump in the background
and then run an ipv6 ping generating 2000 byte
packets with a command like "ping6 -nc3 -s2000 fec0::1",
the ping ends successfully but terminating the
tcpdump may show as few as 8 packets rather than
12. 3 packets going in each direction (echo plus
echo reply) makes 6, doubled for fragments gives
12. I can't for the life of me think why this
should be.
Are you waiting long enough? IIRC there are two buffers, and read
returns when they get full or timeout, and I don't know that ^C causes
the final read.
Michael Richardson
2012-07-20 15:44:30 UTC
Permalink
Post by Darren Reed
For example, if I start tcpdump in the background
and then run an ipv6 ping generating 2000 byte
packets with a command like "ping6 -nc3 -s2000 fec0::1",
the ping ends successfully but terminating the
tcpdump may show as few as 8 packets rather than
12. 3 packets going in each direction (echo plus
echo reply) makes 6, doubled for fragments gives
12. I can't for the life of me think why this
should be.
Greg> Are you waiting long enough? IIRC there are two buffers, and read
Greg> returns when they get full or timeout, and I don't know that ^C causes
Greg> the final read.

This sounds reasonable to me.
ktruss it and see.

Perhaps libpcap needs some changes, or perhaps the kernel needs some.
Send patches to tcpdump.org via github please.
--
] He who is tired of Weird Al is tired of life! | firewalls [
] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[
] ***@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[
Kyoto Plus: watch the video

then sign the petition.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2012-07-20 19:09:30 UTC
Permalink
Post by Michael Richardson
Post by Darren Reed
For example, if I start tcpdump in the background
and then run an ipv6 ping generating 2000 byte
packets with a command like "ping6 -nc3 -s2000 fec0::1",
the ping ends successfully but terminating the
tcpdump may show as few as 8 packets rather than
12. 3 packets going in each direction (echo plus
echo reply) makes 6, doubled for fragments gives
12. I can't for the life of me think why this
should be.
Greg> Are you waiting long enough? IIRC there are two buffers, and read
Greg> returns when they get full or timeout, and I don't know that ^C causes
Greg> the final read.
This sounds reasonable to me.
ktruss it and see.
Perhaps libpcap needs some changes, or perhaps the kernel needs some.
Send patches to tcpdump.org via github please.
I've added this on sourceforge:
https://sourceforge.net/tracker/?func=detail&aid=3546396&group_id=53067&atid=469579

It seems like a relatively straight forward change:
- if pcap loop has been broken because of EINTR
and if pcap is working in blocking mode,
set non-blocking mode and attempt to read any
packets that are currently buffered.

It solved 93% of the issues I saw with tcpdump not
returning the correct number of packets (13 out of
14 tests showed improvement.) The last one seemed
like a more complex timing issue.

A copy of the patch and problem has been filed with GNATS:
bin/46729: tcpdump does not read all buffered packets on Ctrl-C
http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=46729

... Note that patches have been copy-pasted, so there will
be a lack of TABS...

In reflection, it might be tempting to approach the solution
with a different patch: if the current behaviour is blocking
and non-blocking can be successfully set, continue looping
unti there is no more data (would then flush both buffers)
and then return EINTR. However that behaviour is much more
dependent on the reader not returning 0 (e.g. because of
EWOULDBLOCK) and thus loop'ing around the read(), so I'm
not entirely sure that it is sensible. A similar question
might be asked about whether to do the read_op() twice as
BPF has two buffers where data might be present, but then
that behaviour is specific to BPF.
read

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael Richardson
2012-07-30 17:28:17 UTC
Permalink
Darren> It seems like a relatively straight forward change:
Darren> - if pcap loop has been broken because of EINTR
Darren> and if pcap is working in blocking mode,
Darren> set non-blocking mode and attempt to read any
Darren> packets that are currently buffered.

so, in blocking mode, on NetBSD, why doesn't the pcap_loop() start up
each time thre is a packet received? Is there a memory mapped shared
ring-buffer involved? (If so, can't the not-yet-signaled packets be
taken from the ring descriptors?)
--
] He who is tired of Weird Al is tired of life! | firewalls [
] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[
] ***@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[
Kyoto Plus: watch the video http://youtu.be/kzx1ycLXQSE
then sign the petition.
Greg Troxel
2012-07-30 19:00:28 UTC
Permalink
Post by Michael Richardson
Darren> - if pcap loop has been broken because of EINTR
Darren> and if pcap is working in blocking mode,
Darren> set non-blocking mode and attempt to read any
Darren> packets that are currently buffered.
so, in blocking mode, on NetBSD, why doesn't the pcap_loop() start up
each time thre is a packet received? Is there a memory mapped shared
ring-buffer involved? (If so, can't the not-yet-signaled packets be
taken from the ring descriptors?)
My understanding from the last time I really dug into this was that
normally, read on bpf would return on the sooner of

there is any data and a timeout (1s) has passed, or

the buffer becomes full and bpf flips to the second buffer


But it seems not to work this way any more.
Darren Reed
2012-08-01 12:07:32 UTC
Permalink
Post by Greg Troxel
Post by Michael Richardson
Darren> - if pcap loop has been broken because of EINTR
Darren> and if pcap is working in blocking mode,
Darren> set non-blocking mode and attempt to read any
Darren> packets that are currently buffered.
so, in blocking mode, on NetBSD, why doesn't the pcap_loop() start up
each time thre is a packet received? Is there a memory mapped shared
ring-buffer involved? (If so, can't the not-yet-signaled packets be
taken from the ring descriptors?)
My understanding from the last time I really dug into this was that
normally, read on bpf would return on the sooner of
there is any data and a timeout (1s) has passed, or
the buffer becomes full and bpf flips to the second buffer
But it seems not to work this way any more.
It still works that way.

The problem here is one of reliability of tcpdump exiting
with respect to what has been buffered in the kernel and
the expiration of the read
timeout.

e.g. if I do this in a script:

tcpdump -w foo.cap icmp and host bar &
tcpdumpjob=$!
ping -c 3 bar
(assume ping returns 100% packets received)
kill -INT $tcpdumpjob

How many packets does "foo.cmp" have in it?
And more to the point, why should that answer not be 6?
(Lets assume that the only packets eligable to be caught
are the ICMP echo and recho reply.)
The existing implementation has no way to guarantee that
the number of packets in the file "foo.cmp" is 6 because
any packets buffered by BPF after the last 1 second timer
for tcpdump went off will be discarded by the SIGINT.

To say "insert sleep 1 before the kill" makes the script
dependent on internal functionality of tcpdump that appears
(to me) to be undocumented and thus not a reliable behavioural
aspect that can be relied upon. tcpdump could easily be
modified to wait two seconds and continue to work as documented
whilst now falling afoul of a script that only waits one second.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...