Discussion:
How usable is agr(4)?
(too old to reply)
Hauke Fath
2009-06-09 11:43:00 UTC
Permalink
All,

while upgrading a busy nfs fileserver, I have changed it to aggregate
two wm(4) GBit interfaces with agr(4); on the other end is a HP
procurve 2848 switch.

After a few days, a 'netstat -i' gives me

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
wm0 1500 <Link> 00:30:48:d7:0a:78 349626955 23 338123451
0 0
wm1 1500 <Link> 00:30:48:d7:0a:79 7958 0 7955 0 0
agr0 1500 <Link> 00:30:48:d7:0a:78 349620694 16 338117217
3516 2

which is not really balanced. Is this what I should expect? Or what
am I missing?

hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2009-06-09 13:16:38 UTC
Permalink
Post by Hauke Fath
All,
while upgrading a busy nfs fileserver, I have changed it to aggregate
two wm(4) GBit interfaces with agr(4); on the other end is a HP procurve
2848 switch.
After a few days, a 'netstat -i' gives me
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
wm0 1500 <Link> 00:30:48:d7:0a:78 349626955 23 338123451 0
0
wm1 1500 <Link> 00:30:48:d7:0a:79 7958 0 7955 0 0
agr0 1500 <Link> 00:30:48:d7:0a:78 349620694 16 338117217
3516 2
which is not really balanced. Is this what I should expect? Or what am I
missing?
The balance is done based on a hash of the source and destination
MAC addresses. So if your traffic is going though a router, you won't have
load-balancing. If the clients are local (and there's enough of them)
I would expect it to work.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Hauke Fath
2009-06-09 13:34:44 UTC
Permalink
Post by Manuel Bouyer
The balance is done based on a hash of the source and destination
MAC addresses. So if your traffic is going though a router, you won't have
load-balancing.
There's the rub: The server sits in its own subnet. So while other
(server) machines in that subnet would access it, most of the load
comes from other subnets.
Post by Manuel Bouyer
If the clients are local (and there's enough of them)
I would expect it to work.
Pity, it sounded like a good idea... but will only add overhead, as it is.

Thanks for the explanation,

hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2009-06-09 13:42:58 UTC
Permalink
Post by Hauke Fath
Post by Manuel Bouyer
The balance is done based on a hash of the source and destination
MAC addresses.
Pity, it sounded like a good idea...
Perhaps agr should have a mode where it really does load-balance, eg by
queueing outbound packets onj whichever interface has a shorter queue,
instead of doing crude approximations based on hashing? Sure, there
are plenty of environments where the packet reordering damage is not
worth the bandwidth gain, but there are also plenty of environments
where the tradeoff goes the other way (anything where the load is
(almost) all UDP, for example, such as DNS, or many NFS setups, or
where there are lots of TCP streams, each individually slow enough that
multiple packets from the same flow in the queues simultaneously is
rare enough to be ignored).

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Brownlee
2009-06-09 13:54:27 UTC
Permalink
Post by der Mouse
Perhaps agr should have a mode where it really does load-balance, eg by
queueing outbound packets onj whichever interface has a shorter queue,
instead of doing crude approximations based on hashing? Sure, there
are plenty of environments where the packet reordering damage is not
worth the bandwidth gain, but there are also plenty of environments
where the tradeoff goes the other way (anything where the load is
(almost) all UDP, for example, such as DNS, or many NFS setups, or
where there are lots of TCP streams, each individually slow enough that
multiple packets from the same flow in the queues simultaneously is
rare enough to be ignored).
Or even hash based on port as well as IP...
--
David/absolute -- www.NetBSD.org: No hype required --

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2009-06-09 14:26:33 UTC
Permalink
Post by der Mouse
Post by Hauke Fath
Post by Manuel Bouyer
The balance is done based on a hash of the source and destination
MAC addresses.
Pity, it sounded like a good idea...
Perhaps agr should have a mode where it really does load-balance, eg by
queueing outbound packets onj whichever interface has a shorter queue,
instead of doing crude approximations based on hashing? Sure, there
This is what IFF_LINK0 does I think:
link0 Use the round-robin distribution algorithm. Don't use it unless
you're really sure, because it violates the frame ordering rule.

Of course you'll also have to convince the other end to do the same.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2009-06-09 15:15:41 UTC
Permalink
Post by Manuel Bouyer
Post by der Mouse
Perhaps agr should have a mode where it really does load-balance, eg
by queueing outbound packets onj whichever interface has a shorter
queue, instead of doing crude approximations based on hashing?
link0 Use the round-robin distribution algorithm.
Ooo, that's close. Close enough to be useful in almost the same set of
circumstances true load balancing would be. (I can easily imagine an
application where, for example, packets alternate between large and
small, in which case round-robin distribution with two interfaces will
send substantially more traffic through one member than the other. But
I suspect that such cases are rare enough to be considered
pathological, ignorable unless you happen to be stuck with one.)
Post by Manuel Bouyer
Don't use it unless you're really sure, because it violates the
frame ordering rule.
IP has never promised packet order preservation; anything that breaks
in the presence of packet reordering is already broken and deserves to
be rendered _visibly_ broken.

Of course, there are plenty of things for which packet ordering is a
performance issue even if not a correctness issue....

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2009-06-09 22:45:48 UTC
Permalink
hi,
Post by der Mouse
Post by Manuel Bouyer
Don't use it unless you're really sure, because it violates the
frame ordering rule.
IP has never promised packet order preservation; anything that breaks
in the presence of packet reordering is already broken and deserves to
be rendered _visibly_ broken.
iirc, 802.3 or something promised ordering preservation.
agr lives in that layer.

YAMAMOTO Takashi

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2009-06-09 15:57:36 UTC
Permalink
Post by Manuel Bouyer
Post by Hauke Fath
All,
while upgrading a busy nfs fileserver, I have changed it to aggregate
two wm(4) GBit interfaces with agr(4); on the other end is a HP procurve
2848 switch.
The balance is done based on a hash of the source and destination
MAC addresses. So if your traffic is going though a router, you won't have
load-balancing. If the clients are local (and there's enough of them)
I would expect it to work.
MAC *and* IP addresses. But the IP address portion of the hash is weird
and I am not sure it works well in the general case.

What version of NetBSD is this? I have never seen agr successfully
advance LACP to the forwarding (COLLECTING/DISTRIBUTING) state with any
commercial switch under NetBSD 5 or newer. But we recently added static
aggregation configuration which should probably work with switches that
are set to do Cisco "etherchannel".

The hash should really use the TCP/UDP port numbers if available but that
means an ugly and potentially costly peek inside the packet.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2009-06-09 22:49:10 UTC
Permalink
Post by Thor Lancelot Simon
The hash should really use the TCP/UDP port numbers if available but that
means an ugly and potentially costly peek inside the packet.
i didn't implement it because fragmentation is normal for udp.
ie. it requires stateful inspection of packets.

YAMAMOTO Takashi

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Hauke Fath
2009-06-09 17:33:12 UTC
Permalink
Post by Thor Lancelot Simon
What version of NetBSD is this?
netbsd-5 from June 4th sources.

There are changes to agr(4) in -current, adding vlan support (which I ended
up not needing); unfortunately no man page update. They pull up cleanly to
netbsd-5, but the resulting kernel paniced during 'ifconfig agr0', so I
gave up on that.

# ifconfig agr0 create
# ifconfig agr0 agrport wm0
agr_addport: SIOCINITIFADDR error 25
fatal page fault in supervisor mode
trap type 6 code 0 eip c0517926 cs 8 eflags 10296 cr2 0 ilevel 6
kernel: supervisor trap page fault, code=0
Stopped in pid 0.5 (system) at netbsd:agrtimer_tick+0x16: movl
0(%esi),%eax
db{0}> t
agrtimer_tick(c4929800,0,0,0,0,c0b1e160,c0b1e968,c0b1f168,c0b1f968,c0517910) at
netbsd:agrtimer_tick+0x16
callout_softclock(0,0,0,0,0,0,0,3,0,0) at netbsd:callout_softclock+0x14c
softint_dispatch(cda07c80,2,0,0,0,0,cde75d90,cde75d28,cda07500,0) at
netbsd:softint_dispatch+0x7c
DDB lost frame for netbsd:Xsoftintr+0x3d, trying 0xcde75d88
Xsoftintr() at netbsd:Xsoftintr+0x3d
--- interrupt ---
fatal page fault in supervisor mode
trap type 6 code 0 eip c0541237 cs 8 eflags 10206 cr2 3a ilevel 8
kernel: supervisor trap page fault, code=0
Faulted in DDB; continuing...
db{0}>

(Ahh, conserver...)
Post by Thor Lancelot Simon
I have never seen agr successfully
advance LACP to the forwarding (COLLECTING/DISTRIBUTING) state with any
commercial switch under NetBSD 5 or newer.
It puzzled me, too:

# ifconfig agr0
agr0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
agrport: wm0, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: wm1, flags=0x0
address: 00:30:48:d7:0a:78

While the man page has a lot to say about bugs, it does not discuss the
above ifconfig flags.

hauke

--
"It's never straight up and down" (DEVO)



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2009-06-09 18:02:47 UTC
Permalink
Post by Thor Lancelot Simon
Post by Manuel Bouyer
Post by Hauke Fath
All,
while upgrading a busy nfs fileserver, I have changed it to aggregate
two wm(4) GBit interfaces with agr(4); on the other end is a HP procurve
2848 switch.
The balance is done based on a hash of the source and destination
MAC addresses. So if your traffic is going though a router, you won't have
load-balancing. If the clients are local (and there's enough of them)
I would expect it to work.
MAC *and* IP addresses.
Indeed, I checked the code.
But I'm not sure if a switch will hash on the IP address itself, so for
the client->server path it may end up using always the same link.
Post by Thor Lancelot Simon
But the IP address portion of the hash is weird
and I am not sure it works well in the general case.
I think for inet6 autoconfigured hosts (or local-link addresses) it's going
to always return either odd or even numbers, if the source and destinations
are both on the local network (because we do, in fact, hash the ethernet
addresses twice, and the non-ethernet part is constant for all hosts on the
local network).

In the case we're discussing, all but the last byte of the dst IP address
is constant. As there's 2 interfaces the link is choosen by the least
significant bit of the IP address. If the client's IP addresses are all-odd or
all-even there's no balancing. If they are mixed there should be some
balancing.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Mihai Chelaru
2009-07-14 14:31:24 UTC
Permalink
Post by Thor Lancelot Simon
I have never seen agr successfully
advance LACP to the forwarding (COLLECTING/DISTRIBUTING) state with any
commercial switch under NetBSD 5 or newer. But we recently added static
aggregation configuration which should probably work with switches that
are set to do Cisco "etherchannel".
I'm trying here to configure an etherchannel between netbsd-5 and a
catalyst 3550. No luck. One port on switch cannot leave alarm. Some
reports:

$ ifconfig agr0
agr0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500

capabilities=3f00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx>
enabled=0
agrport: bnx1, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: bnx0, flags=0x0
address: 00:1e:4f:1e:d5:a1
inet 193.28.151.2 netmask 0xffffffc0 broadcast 193.28.151.63
inet6 fe80::21e:4fff:fe1e:d5a1%agr0 prefixlen 64 scopeid 0x4
$ ifconfig bnx0
bnx0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500

capabilities=3f00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx>
enabled=0
address: 00:1e:4f:1e:d5:a1
media: Ethernet autoselect (100baseTX full-duplex)
status: active
$ ifconfig bnx1
bnx1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500

capabilities=3f00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx>
enabled=0
address: 00:1e:4f:1e:d5:9f
media: Ethernet autoselect (100baseTX full-duplex)
status: active



tt#sh lacp 1 internal
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive
mode

Channel group 1
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Fa0/7 SP bndl 500 0x1 0x1 0x7 0x3C
Fa0/8 SP indep 500 0x1 0x1 0x8 0x4

tt#s int f0/8
FastEthernet0/8 is up, line protocol is down (notconnect)
...


Any hints ?
--
Mihai

P.S. The good thing is that at least works as a passive backup. If I plug
out a wire it switches traffic on the other.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
6***@6bone.informatik.uni-leipzig.de
2009-06-09 17:24:30 UTC
Permalink
agr and cisco work with netbsd5. Unfortunately there seems to be a bug in
the netbsd5 implementation. vlan tagged interfaces with an agr interface
as parent do not set the vlan tag for outgoing packets.

Uwe


ifconfig agr0
agr0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
agrport: wm2, flags=0x3<COLLECTING,DISTRIBUTING>
agrport: wm3, flags=0x3<COLLECTING,DISTRIBUTING>
address: 00:14:22:1d:8b:41
inet x.x.x.x netmask 0xfffffff8 broadcast x.x.x.x
inet6 fe80::214:22ff:fe1d:8b41%agr0 prefixlen 64 scopeid 0x7
inet6 2001:xxxx prefixlen 64


show lacp neighbor
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode

Channel group 12 neighbors

Partner's information:

Partner Partner LACP Partner Partner Partner Partner Partner
Port Flags State Port Priority Admin Key Oper Key Port Number Port State
Gi9/16 SA bndl 32768 0x0 0xF0 0x4 0x3D
Gi9/18 SA bndl 32768 0x0 0xF0 0x5 0x3D

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...