Discussion:
Howto use agr to aggregate VPN tunnels
(too old to reply)
BERTRAND Joël
2016-12-09 09:56:57 UTC
Permalink
Hello,

I open a new thread as I have made some tests and I'm now pretty sure
that issue I see comes from NetBSD.

I'm able to use agr with two physical ethernet controllers. But I'm not
able to obtain a running agr interface with two OpenVPN tunnels.
Maybe problem comes from NetBSD kernel, maybe from misconfiguration, I
have no idea to fix it.

I have created two OpenVPN tap tunnels between a server an a NetBSD
workstation (DEC PWS500au running 7.99.43, but I have seen same issue
with 7.0.2 on amd64). Both tunnels runs as expected.

I have removed inet/inet6 address from both tunnels :

tap0: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=5<VLAN_MTU,JUMBO_MTU>
ec_enabled=0
address: f2:0b:a4:b2:cb:28
media: Ethernet autoselect
tap1: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=5<VLAN_MTU,JUMBO_MTU>
ec_enabled=0
address: f2:0b:a4:e9:16:fe
media: Ethernet autoselect

and I have created agr0 (round robin):

agr0: flags=0xb843<UP,BROADCAST,RUNNING,SIMPLEX,LINK0,LINK1,MULTICAST>
mtu 1500
agrport: tap0, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: tap1, flags=0x3<COLLECTING,DISTRIBUTING>
address: f2:0b:a4:b2:cb:28
inet 192.168.100.2/24 broadcast 192.168.100.255 flags 0x0
inet6 fe80::f00b:a4ff:feb2:cb28%agr0/64 flags 0x2<TENTATIVE>
scopeid 0x6

I have checked that 192.168.100.0/24 route goes through agr0 :
Internet:
Destination Gateway Flags Refs Use Mtu
Interface
default weierstrass UG - - -L epic0
127/8 localhost UGR - - 33112L lo0
localhost lo0 UHl - - 33112L lo0
192.168.0/24 link#3 U - - -L epic0
einstein link#3 UHl - - -L lo0
192.168.100/24 link#6 U - - -L agr0
192.168.100.2 link#6 UHl - - -L lo0

If I try to ping 192.168.100.1 (server), kernel sends packets to agr0 :
einstein# tcpdump -i agr0 -p
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on agr0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:34:25.250725 ARP, Request who-has 192.168.100.1 tell 192.168.100.2,
length 28
10:34:26.253355 ARP, Request who-has 192.168.100.1 tell 192.168.100.2,
length 28
10:34:27.252354 ARP, Request who-has 192.168.100.1 tell 192.168.100.2,
length 28
10:34:28.253310 ARP, Request who-has 192.168.100.1 tell 192.168.100.2,
length 28
10:34:29.252338 ARP, Request who-has 192.168.100.1 tell 192.168.100.2,
length 28
10:34:30.252331 ARP, Request who-has 192.168.100.1 tell 192.168.100.2,
length 28
10:34:31.256259 ARP, Request who-has 192.168.100.1 tell 192.168.100.2,
length 28
^C
7 packets captured
7 packets received by filter
0 packets dropped by kernel

but no packet is sent by tap0 or tap1 :

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
einstein# tcpdump -i tap1 -p
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap1, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

In reception, when server tries to ping NetBSD client, tap0 and tap1
receive ethernet packets, but these packets are never transmitted to agr0 !

einstein# tcpdump -i tap0 -p
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:45:53.866399 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
10:45:55.914946 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
...
einstein# tcpdump -i agr0 -p
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on agr0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

I don't understand why there is no logical connection between tap0/tap1
and agr0. Of course, I have verified that agr0 uses tap0 and tap1 as
slave interfaces.

The same configuration runs fine with two physical ethernet
controllers. I have create agr1 that aggregates wm1 and wm2 (802.3ad):

agr1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
agrport: wm1, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: wm2, flags=0x3<COLLECTING,DISTRIBUTING>
address: 68:05:ca:02:b2:59
inet 192.168.10.128 netmask 0xffffff00 broadcast 192.168.10.255
inet6 fe80::6a05:caff:fe02:b259%agr0 prefixlen 64 scopeid 0x5
inet6 2001:7a8:a8ed:10::128 prefixlen 64

and agr1 runs as expected.

When I compare agr0 and agr1, I note that agr0 doesn't indicate IPv4
and IPv6 capabilities. Why ? If I understand, agr0 has to indicate these
capabilities to work as expected.

Best regards,

JKB



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Paul Goyette
2016-12-09 10:13:01 UTC
Permalink
This isn't going to help solve your problem, it's just for your info...

On Fri, 9 Dec 2016, BERTRAND Joël wrote:

<snip>
Post by BERTRAND Joël
agr1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
agrport: wm1, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: wm2, flags=0x3<COLLECTING,DISTRIBUTING>
address: 68:05:ca:02:b2:59
inet 192.168.10.128 netmask 0xffffff00 broadcast 192.168.10.255
inet6 fe80::6a05:caff:fe02:b259%agr0 prefixlen 64 scopeid 0x5
inet6 2001:7a8:a8ed:10::128 prefixlen 64
and agr1 runs as expected.
When I compare agr0 and agr1, I note that agr0 doesn't indicate IPv4
and IPv6 capabilities. Why ? If I understand, agr0 has to indicate these
capabilities to work as expected.
I assume that by "indicate IPv5 and IPv6 capabilities" you are referring
to the varions hardware-offload flags listed above. These are totally
optional, and if hardware (or, in your case pseudo-hardware tap!) does
not provide these capabilities, they are provided in software. These
flags are documented in a bit more detail in the ifconfig(8) man page
(but you need to search for the flag names in lower-case!)




+------------------+--------------------------+------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+
Greg Troxel
2016-12-14 12:40:53 UTC
Permalink
[moved back to tech-net. tech-kern is for kernel stuff that doesn't
have a sub-list.]
There is no way to configure agr interfaces without attaching physical
interfaces.
Is tap considered as physical interface or not ? tap has MAC
address thus I think that is not a limitation. And agr created with
tap0 and tap1 uses tap0 MAC address.
They are not actually physical of course, but I don't see any reason it
should not work. However, if no one has tried and fixed any bugs that
stop it from working, it might well not. So I suspect digging in with
gdb or printf might help.
Also, I would suggest setting up agr with two normal ethernet interfaces
to be really sure you are doing everything else right.
On the same server, I use agr1 with both wm0 and wm1 without
any trouble. Thus, I'm pretty sure that this issue comes from kernel
itself. This morning, I have had a look in agr directory, but I don't
know what I'm looking for. tap0 and tap1 are registered in agr0, but
no packets are sent over enslaved interfaces...
I'll try to fix this issue but any help will be welcome.
My usual advice is to run tcpdump on every interface that's possibly
relevant with -w to save the packets to a file, and then look over them
later. Specifically I mean this and not live tcpdump and not wireshark.
And then generating test traffic in each direction on each path.
Perhaps you already did this.

My second standard advice is to run "netstat -s" and save the output to
a file before and after each test, and diff them, to find changes in
counters that you do not expect, as well as to look for the expected
changes.

I would in this case also advise reading the code. I am really not
familiar with agr, but I wouldn't be surprised if there have to be some
agr hooks in a driver, and I further wouldn't be surprised if those
hooks are present in real ethernet interfaces used for agr by others,
but not present in tap. bpf works this way, but usually the person who
makes the driver work the first time at all cares about bpf. You are
probably the first person to care about agr/tap.
BERTRAND Joël
2016-12-14 15:15:33 UTC
Permalink
Hello,

I have tried to make tap0 and tap1 up. This seems to fix half of issue...


tap0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=5<VLAN_MTU,JUMBO_MTU>
ec_enabled=0
address: f2:0b:a4:b7:7f:59
media: Ethernet autoselect
tap1: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=5<VLAN_MTU,JUMBO_MTU>
ec_enabled=0
address: f2:0b:a4:98:69:5d
media: Ethernet autoselect
agr0: flags=0xb843<UP,BROADCAST,RUNNING,SIMPLEX,LINK0,LINK1,MULTICAST>
mtu 1500
agrport: tap0, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: tap1, flags=0x3<COLLECTING,DISTRIBUTING>
address: f2:0b:a4:b7:7f:59
inet 192.168.100.2/24 broadcast 192.168.100.255 flags 0x0
inet6 fe80::f00b:a4ff:feb7:7f59%agr0/64 flags 0x0 scopeid 0x6

When I ping 192.168.100.2 from Linux host, I see on interfaces :
einstein# tcpdump -i tap0 -p

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:13:06.652688 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
16:13:08.700795 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
16:13:10.748774 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
^C
einstein# tcpdump -i tap1 -p
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:13:19.964721 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
16:13:22.012887 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
16:13:24.060809 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
^C
einstein# tcpdump -i agr0 -p
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on agr0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:13:35.324820 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
16:13:35.324908 ARP, Reply 192.168.100.2 is-at f2:0b:a4:b7:7f:59 (oui
Unknown), length 28
16:13:36.403632 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
16:13:36.403705 ARP, Reply 192.168.100.2 is-at f2:0b:a4:b7:7f:59 (oui
Unknown), length 28
16:13:38.374085 ARP, Request who-has 192.168.100.2 tell 192.168.100.1,
length 28
16:13:38.374171 ARP, Reply 192.168.100.2 is-at f2:0b:a4:b7:7f:59 (oui
Unknown), length 28
^C

Now, incoming packets are received by agr0 and kernel sends answers.
But outgoing packets are not sent over tap0 and tap1.

Best regards,

JKB


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
BERTRAND Joël
2016-12-15 10:21:08 UTC
Permalink
Post by Greg Troxel
My second standard advice is to run "netstat -s" and save the output to
a file before and after each test, and diff them, to find changes in
counters that you do not expect, as well as to look for the expected
changes.
OK. I have done netstat -s on all interfaces before and after a ping on
agr0 (outgoing packets).

I don't know how interpret diff output :

einstein# diff -u tap0.orig tap0.final
einstein# diff -u tap1.orig tap1.final
einstein# diff -u agr0.orig agr0.final
--- agr0.orig 2016-12-15 11:11:11.273990884 +0100
+++ agr0.final 2016-12-15 11:11:30.261239755 +0100
@@ -1,3 +1,3 @@
-agr0 1500 <Link> f2:0b:a4:b7:7f:59 202 0 0
304 0
-agr0 1500 fe80::/64 fe80::f00b:a4ff:f 202 0 0
304 0
-agr0 1500 192.168.100/2 192.168.100.2 202 0 0
304 0
+agr0 1500 <Link> f2:0b:a4:b7:7f:59 202 0 0
313 0
+agr0 1500 fe80::/64 fe80::f00b:a4ff:f 202 0 0
313 0
+agr0 1500 192.168.100/2 192.168.100.2 202 0 0
313 0
einstein#

On ingoing packets, result is :

einstein# diff -u agr0.orig agr0.final
--- agr0.orig 2016-12-15 11:15:17.420389033 +0100
+++ agr0.final 2016-12-15 11:16:53.044680951 +0100
@@ -1,3 +1,3 @@
-agr0 1500 <Link> f2:0b:a4:b7:7f:59 202 0 0
353 0
-agr0 1500 fe80::/64 fe80::f00b:a4ff:f 202 0 0
353 0
-agr0 1500 192.168.100/2 192.168.100.2 202 0 0
353 0
+agr0 1500 <Link> f2:0b:a4:b7:7f:59 232 0 0
383 0
+agr0 1500 fe80::/64 fe80::f00b:a4ff:f 232 0 0
383 0
+agr0 1500 192.168.100/2 192.168.100.2 232 0 0
383 0
einstein# diff -u tap0.orig tap0.final
--- tap0.orig 2016-12-15 11:15:06.726431924 +0100
+++ tap0.final 2016-12-15 11:17:00.154048113 +0100
@@ -1 +1 @@
-tap0 1500 <Link> f2:0b:a4:b7:7f:59 105 0 4
0 0
+tap0 1500 <Link> f2:0b:a4:b7:7f:59 120 0 4
0 0
einstein# diff -u tap1.orig tap1.final
--- tap1.orig 2016-12-15 11:15:11.109520868 +0100
+++ tap1.final 2016-12-15 11:17:04.193562407 +0100
@@ -1 +1 @@
-tap1 1500 <Link> f2:0b:a4:98:69:5d 104 0 4
0 0
+tap1 1500 <Link> f2:0b:a4:98:69:5d 119 0 4
0 0
einstein#

For me, ingoing packets revceived by tap0 and tap1 are sent to agr0,
but not outgoing packets.

Best regards,

JKB

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2016-12-15 18:05:19 UTC
Permalink
For me, ingoing packets revceived by tap0 and tap1 are sent to agr0, but
not outgoing packets.
So packets you send out agr0 (and see in the agr0 tcpdump output) do not appear
in the output for either tap0 or tap1? I just want to be sure.

Did you have the tap0 and tap1 interfaces in the "up" state *BEFORE* attaching
them to the agr? I ask because this sounds like the internal state machine in
agr has not moved to the "distributing" state where it sends packets out each
attached interface -- like there is a LACP or other failure, and I bet attaching
the underlying interfaces while they are not "up" is a good way to get stuck
like that.

I also wonder whether openvpn is transmitting the LACP PDUs across the link.
Do you see them on the far end when you tcpdump the underlying tap interfaces?

All in all this is a pretty awful way to get ECMP. Even if you get it going,
are you sure it's worth the trouble?
--
Thor Lancelot Simon ***@panix.com

Ring the bells that still can ring.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
BERTRAND joël
2016-12-15 19:23:12 UTC
Permalink
Post by Thor Lancelot Simon
For me, ingoing packets revceived by tap0 and tap1 are sent to agr0, but
not outgoing packets.
So packets you send out agr0 (and see in the agr0 tcpdump output) do not appear
in the output for either tap0 or tap1? I just want to be sure.
Right.
Post by Thor Lancelot Simon
Did you have the tap0 and tap1 interfaces in the "up" state *BEFORE* attaching
them to the agr? I ask because this sounds like the internal state machine in
agr has not moved to the "distributing" state where it sends packets out each
attached interface -- like there is a LACP or other failure, and I bet attaching
the underlying interfaces while they are not "up" is a good way to get stuck
like that.
Both tap0 and tap1 were up before adding these interfaces to agr.
Post by Thor Lancelot Simon
I also wonder whether openvpn is transmitting the LACP PDUs across the link.
Do you see them on the far end when you tcpdump the underlying tap interfaces?
OpenVPN sends ethernet traffic (in this configuration). I use
L2-OpenVPN links for a long time without any trouble (even with NetBSD),
thus I'm pretty sure that issue doesn't come from VPN itself. In my test
configuration, I use a round robin agr0 (link0+link1) and I've only seen
ARP frames. If I create a bond0 mode 4 (802.3ad) on linux server, NetBSD
receives LACP PDU (but doesn't answer).
Post by Thor Lancelot Simon
All in all this is a pretty awful way to get ECMP. Even if you get it going,
are you sure it's worth the trouble?
I have to aggregate two L2-OpenVPN links between a server and another
one. I have done this kind of aggregation without any trouble when both
sides run Linux. In this case, foreign server run NetBSD. I have no choice.

Best regards,

JKB


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
BERTRAND Joël
2016-12-16 10:14:40 UTC
Permalink
Post by Thor Lancelot Simon
Did you have the tap0 and tap1 interfaces in the "up" state *BEFORE* attaching
them to the agr? I ask because this sounds like the internal state machine in
agr has not moved to the "distributing" state where it sends packets out each
attached interface -- like there is a LACP or other failure, and I bet attaching
the underlying interfaces while they are not "up" is a good way to get stuck
like that.
I have tried to set tap interfaces up before and after attaching them
to agr0. No change. Ingoing packets are received but outgoing packets
are not sent over agr0.

Best regards,

JKB

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2016-12-16 16:02:59 UTC
Permalink
That's not netstat -s, it's netstat -i. (Also you have wrapped the
diffs which makes them hard to read.)

The point of netstat -s is that there a vast number of counters in the
system. On my netbsd-7 system there are 555.lines of output. Many of
them are error counters that stay 0. With problems, it's common (but
not universal) that some of these counters increase; in theory any error
and any normal path should increment some counter. So the idea is to
look for changes you don't understand.

For interface counts, you have to know how this is supposed to work and
to trace the increments across multiple interfaces, removing all other
confusing traffic and injecting test traffic. This is indeed not easy.
BERTRAND joël
2016-12-16 18:46:19 UTC
Permalink
Post by Greg Troxel
That's not netstat -s, it's netstat -i. (Also you have wrapped the
diffs which makes them hard to read.)
netstat -s -I tap0 only returns one line :

einstein# netstat -I tap0 -s
tap0 1500 <Link> f2:0b:a4:31:da:48 104 0 0
0 0
einstein#

If I use netstat -s without interface, I will obtain an aggregation of
all interfaces statistics and I don't know how interpret result.

Best regards,

JKB

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2016-12-17 00:23:46 UTC
Permalink
Post by BERTRAND joël
Post by Greg Troxel
That's not netstat -s, it's netstat -i. (Also you have wrapped the
diffs which makes them hard to read.)
einstein# netstat -I tap0 -s
tap0 1500 <Link> f2:0b:a4:31:da:48 104 0 0 0
0
einstein#
That's not -s, that's what -I returns. It never occured to me to do
them at once.
Post by BERTRAND joël
If I use netstat -s without interface, I will obtain an
aggregation of all interfaces statistics and I don't know how
interpret result.
Yes, that's right. -s does not show you interface stats. It shows you
overall networking stats. Unfortunately you need to understand more to
make sense of it. But seriously: do the exercise I suggest: run netstat
-s before and after, and diff, and think about what you see, and look up
things to understand.
BERTRAND joël
2016-12-19 21:08:29 UTC
Permalink
Post by Greg Troxel
Yes, that's right. -s does not show you interface stats. It shows you
overall networking stats. Unfortunately you need to understand more to
make sense of it. But seriously: do the exercise I suggest: run netstat
-s before and after, and diff, and think about what you see, and look up
things to understand.
--- orig 2016-12-19 21:29:38.032607635 +0100
+++ fin 2016-12-19 21:29:47.276231689 +0100
@@ -5,7 +5,7 @@
unreach: 12
0 messages with bad code fields
0 messages < minimum length
- 1 bad checksum
+ 5 bad checksums <------------------------------ ?
0 messages with bad length
0 multicast echo requests ignored
0 multicast timestamp requests ignored
@@ -86,18 +86,18 @@
0 packets with ECN CE bit
0 packets ECN ECT(0) bit
udp:
- 80 datagrams received
+ 83 datagrams received
0 with incomplete header
0 with bad data length field
0 with bad checksum
12 dropped due to no socket
0 broadcast/multicast datagrams dropped due to no socket
0 dropped due to full socket buffers
- 68 delivered
- 75 PCB hash misses
- 81 datagrams output
+ 71 delivered
+ 77 PCB hash misses
+ 83 datagrams output
ip:
- 81 total packets received
+ 88 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
@@ -112,14 +112,14 @@
0 malformed fragments dropped
0 fragments dropped after timeout
0 packets reassembled ok
- 81 packets for this host
+ 88 packets for this host
0 packets for unknown/unsupported protocol
0 packets forwarded (0 packets fast forwarded)
0 packets not forwardable
0 redirects sent
0 packets no matching gif found
- 94 packets sent from this host
- 2 packets sent with fabricated ip header
+ 100 packets sent from this host
+ 7 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
0 output packets discarded due to no route
0 output datagrams fragmented
@@ -291,9 +291,9 @@
0 delivered
0 datagrams output
arp:
- 14 packets sent
+ 19 packets sent
6 reply packets
- 8 request packets
+ 13 request packets
9 packets received
0 reply packets
9 valid request packets
@@ -310,9 +310,9 @@
0 packets received on wrong interface
0 entrys overwritten
0 changes in hardware address length
- 1 packet deferred pending ARP resolution
+ 6 packets deferred pending ARP resolution
0 sent
- 1 dropped
+ 6 dropped
0 failures to allocate llinfo
ddp:
0 packets with short headers

After a long uptime and a lot of ethernet transmitted packets :
einstein# netstat -s | grep bad
...
5 bad checksums
...

If I try to sent ICMP packets over agr0, bad checksums counter goes up :

28 packets transmitted, 0 packets received, 100.0% packet loss
einstein# netstat -s | grep bad
...
32 bad checksums
...

I suppose packets are not transfered from agr0 to tap0/1 as checksum
are false.

Best regards,

JKB



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2016-12-21 21:00:49 UTC
Permalink
Post by BERTRAND joël
28 packets transmitted, 0 packets received, 100.0% packet loss
einstein# netstat -s | grep bad
...
32 bad checksums
...
I suppose packets are not transfered from agr0 to tap0/1 as checksum are
false.
That's interesting -- they're probably zero or uninitialized. The question
is, why? There have been a number of bugs like this over the years, but since
agr works with real Ethernet interfaces...

Actually, I wonder if anyone has agr working with an Ethernet interface that
does _not_ have checksum offload support. We did a bunch of work at CP to be
sure that checksum offloading would work properly with vlan stacked on agr
stacked on... and we would definitely have tested with checksum offload
_disabled_ in the physical interface at the bottom, but possibly not with an
interface that didn't announce the capability at all -- maybe there's a call
to in_delayed_cksum() missing somewhere?

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
BERTRAND Joël
2017-01-04 21:20:27 UTC
Permalink
Hello,
Post by Thor Lancelot Simon
Post by BERTRAND joël
28 packets transmitted, 0 packets received, 100.0% packet loss
einstein# netstat -s | grep bad
...
32 bad checksums
...
I suppose packets are not transfered from agr0 to tap0/1 as checksum are
false.
That's interesting -- they're probably zero or uninitialized. The question
is, why? There have been a number of bugs like this over the years, but since
agr works with real Ethernet interfaces...
Actually, I wonder if anyone has agr working with an Ethernet interface that
does _not_ have checksum offload support. We did a bunch of work at CP to be
sure that checksum offloading would work properly with vlan stacked on agr
stacked on... and we would definitely have tested with checksum offload
_disabled_ in the physical interface at the bottom, but possibly not with an
interface that didn't announce the capability at all -- maybe there's a call
to in_delayed_cksum() missing somewhere?
I have tried to understood how agr driver works and I have added some
printf() in agr_xmit_frame() (sys/net/agr/if_agr.c). I don't see any
message I have added in this function.

After some investigations, I have found that agr_start() doesn't work
as expected:

IFQ_DEQUEUE(&ifp->if_snd, m) takes frames from queue. In a second time,
agr_select_tx_port() has to return a port of agr interface, but this
function always returns 0.

Here is my modified function :

static void
agr_start(struct ifnet *ifp)
{
struct agr_softc *sc = ifp->if_softc;
struct mbuf *m;

AGR_LOCK(sc);

while (/* CONSTCOND */ 1) {
struct agr_port *port;

printf("agr_start before IFQ_DEQUEUE\n");
IFQ_DEQUEUE(&ifp->if_snd, m);
printf("agr_start after IFQ_DEQUEUE\n");
if (m == NULL) {
printf("m == NULL\n");
break;
}
bpf_mtap(ifp, m);
port = agr_select_tx_port(sc, m);
printf("port=%p\n", port);
if (port) {
int error;

error = agr_xmit_frame(port->port_ifp, m);
if (error) {
ifp->if_oerrors++;
} else {
ifp->if_opackets++;
}
} else {
m_freem(m);
ifp->if_oerrors++;
}
}

AGR_UNLOCK(sc);

ifp->if_flags &= ~IFF_OACTIVE;
}

and I see in dmesg :

agr_start before IFQ_DEQUEUE
agr_start after IFQ_DEQUEUE
port=0x0
agr_start before IFQ_DEQUEUE
agr_start after IFQ_DEQUEUE
m == NULL

agr_select_tx_port() is a redirection to ieee8023ad_select_tx_port()
(ieee8023ad_lacp.c).

I have added some printf() in this new function, and I have seen that
la is always NULL :

la = lsc->lsc_active_aggregator;
if (__predict_false(la == NULL)) {
LACP_DPRINTF((NULL, "%s: no active aggregator\n",
__func__));
printf("end 2\n");
return NULL;
}

I don't understand why no aggregator is active :

tap0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=5<VLAN_MTU,JUMBO_MTU>
ec_enabled=0
address: f2:0b:a4:a5:5c:b2
media: Ethernet autoselect
tap1: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=5<VLAN_MTU,JUMBO_MTU>
ec_enabled=0
address: f2:0b:a4:93:ad:64
media: Ethernet autoselect
agr0: flags=0xb843<UP,BROADCAST,RUNNING,SIMPLEX,LINK0,LINK1,MULTICAST>
mtu 1500
agrport: tap0, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: tap1, flags=0x3<COLLECTING,DISTRIBUTING>
address: f2:0b:a4:a5:5c:b2
inet 192.168.100.2/24 broadcast 192.168.100.255 flags 0x0
inet6 fe80::f00b:a4ff:fea5:5cb2%agr0/64 flags 0x0 scopeid 0x6

Any idea ?

Best regards,

JKB

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
BERTRAND joël
2017-01-11 21:43:31 UTC
Permalink
Hello,

Some news. In my last message, I have indicated that in agr_start()
function, port = agr_select_tx_port(sc, m) always return null pointer in
port.

agr_select_tx_port() is an indirection to function
ieee8023ad_select_tx_port() in src/sys/net/agr/ieee8023ad_lacp.c. In
this function la is always nullified. Thus, lsc_active_aggregator is
never set and no packet can be sent over tap0/tap1 interfaces.

Indeed, if lacp_select_active_aggregator() is called,
lsc->lsc_active_aggregator is never set as speed returned by
lacp_aggregator_bandwidth() is always null for a tap interface.

I don't know what is the best way to fix this issue. All ideas are welcome.

Best regards,

JKB

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2017-01-11 23:31:42 UTC
Permalink
Post by BERTRAND Joël
Hello,
Some news. In my last message, I have indicated that in agr_start()
function, port = agr_select_tx_port(sc, m) always return null pointer in
port.
agr_select_tx_port() is an indirection to function
ieee8023ad_select_tx_port() in src/sys/net/agr/ieee8023ad_lacp.c. In this
function la is always nullified. Thus, lsc_active_aggregator is never set
and no packet can be sent over tap0/tap1 interfaces.
This would seem to be true for _all_ interfaces, then. No?

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
BERTRAND Joël
2017-01-12 09:19:25 UTC
Permalink
Post by Thor Lancelot Simon
Post by BERTRAND Joël
Hello,
Some news. In my last message, I have indicated that in agr_start()
function, port = agr_select_tx_port(sc, m) always return null pointer in
port.
agr_select_tx_port() is an indirection to function
ieee8023ad_select_tx_port() in src/sys/net/agr/ieee8023ad_lacp.c. In this
function la is always nullified. Thus, lsc_active_aggregator is never set
and no packet can be sent over tap0/tap1 interfaces.
This would seem to be true for _all_ interfaces, then. No?
I don't understand why this value would be null for all interfaces...
If I try to create a agr device with real ethernet interfaces, these
interfaces should have a no null speed.

JKB


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2017-01-12 13:05:14 UTC
Permalink
Try
ifconfig tap0 media 100base-TX

(and similar for other tap interfaces you use) after the interface is
created, before adding it to the agr interface.

You can use whatever speed you like (though it has to be one of the
standard ethernet rates - tap is pretending to be an ethernet after all)

I suspect that the agr driver will bias data being queued to its component
interfaces based upon their relative speeds (which is why it needs to know
what they are), so if you want tap0 to get more packets than tap1 make
it pretend to be faster... Otherwise set them all the same, and it should
make no difference what value is picked.

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
BERTRAND Joël
2017-01-12 21:02:55 UTC
Permalink
Post by Robert Elz
Try
ifconfig tap0 media 100base-TX
(and similar for other tap interfaces you use) after the interface is
created, before adding it to the agr interface.
You can use whatever speed you like (though it has to be one of the
standard ethernet rates - tap is pretending to be an ethernet after all)
I suspect that the agr driver will bias data being queued to its component
interfaces based upon their relative speeds (which is why it needs to know
what they are), so if you want tap0 to get more packets than tap1 make
it pretend to be faster... Otherwise set them all the same, and it should
make no difference what value is picked.
kre
OK.

I have tried to add 100base-TX on both tap interfaces and now, I can
send packet over agr0. Next saturday, I'll try to configure my real server.

Best regards,

JKB

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Mike Pumford
2017-01-13 19:11:30 UTC
Permalink
I've been doing a lot of LACP work recently as part of my day job so
I've spend a lot of time debugging code (based on the FreeBSD LACP code
which originally came from NetBSD) and reading the LACP standards so
hopefully I'm qualified to answer questions about LACP. ;)
Post by Robert Elz
Try
ifconfig tap0 media 100base-TX
(and similar for other tap interfaces you use) after the interface is
created, before adding it to the agr interface.
You can use whatever speed you like (though it has to be one of the
standard ethernet rates - tap is pretending to be an ethernet after all)
It probably helps but as far as I can tell from looking at the 7.x lacp
code as log as the media subtypes are the same the netbsd LACP code will
treat them as equivalent speeds.
Post by Robert Elz
I suspect that the agr driver will bias data being queued to its component
interfaces based upon their relative speeds (which is why it needs to know
what they are), so if you want tap0 to get more packets than tap1 make
it pretend to be faster... Otherwise set them all the same, and it should
make no difference what value is picked.
Actually if agr is configured for LACP then there are 2 requirements:

1. interfaces in the aggregate must report as full duplex.
2. For interfaces to actually be combined they must all be operating at
the same speed. These are both requirements of the LACP protocol standard.

In LACP the frame distribution is determined by a hash algorithm not
link capacity.


Mike


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2017-01-14 10:12:28 UTC
Permalink
Date: Fri, 13 Jan 2017 19:11:30 +0000
From: Mike Pumford <***@mudcovered.org.uk>
Message-ID: <b41ef09a-ef11-0828-1363-***@mudcovered.org.uk>

| In LACP the frame distribution is determined by a hash algorithm not
| link capacity.

Yes, that makes sense, avoids re-ordering packets belonging to one TCP
connection by always keeping them on the same link.

Link capacity would be a dreadful metric in any case (I was going to comment
on that, but decided I didn't know nearly enough, so refrained) - if keeping
the links equally busy was a goal (rather than maximizing overall end to end
performance, which avoiding reordering does) then queue lengths (bytes) would
be what to use.

Note: all I know about this is what I saw from a grep of the sources when
I was looking to see just what agr was looking for from its components.
The rest was all just guesswork - but it seemed intuitive to me that if
agr was demanding a non-zero speed from its components, that we just
give it one .. and from what Bertrand said, that appears to have worked.

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...