sin_zero

Discussion:

sin_zero

(too old to reply)

Mouse

2012-02-18 20:26:58 UTC

sin_zero, or more precisely the need to zero it, has been a
longstanding annoyance to me. Today I think I have tracked down at
least one part of it and am wondering about people's opinions on it.

I wrote a program that tries to install and remove routes (under
certain conditions which aren't relevant here). It "worked", in that
the writes to the routing socket succeeded, but the routes didn't get
used for packets. After spending some time digging, I think I know
why.

The kernel routing table for AF_INET is configured with

rn_inithead, 32, sizeof(struct sockaddr_in),

meaning that it skips 32 bits from the beginning of the sockaddr (these
being sin_family, sin_len, and sin_port), then uses the rest of it up
to the full struct - including sin_zero. This meant that when my code
installed routes, it was actually generating a noncontiguous route to
(for example) network 0a0001000475201108cd7940, mask
ffffff0008cd644874756e30, the trailing 8 bytes in each value being
unitialized sin_zero data. Then, of course, packets being sent to (for
example) 0a0001010000000000000000 would not match.

Just to keep things confusing, when netstat queries and prints this
route, it gets 0a0001000475201108cd7940 mask ffffff0008cd644874756e30
but ignores the trailing 8 bytes, even though they're significant to
the kernel, and prints "10.0.1/24".

I am tempted to simply change the routing table config to

rn_inithead, 32, 8,

(probably spelling 8 using sizeof and offsetof rather than a literal
constant, probably also changing 32 to an offsetof call like the one
IPv6 uses).

What I'm wondering is, would this break anything? It seems to me like
a no-downside change, but there's a lot of code I don't truly
understand running around there. I'm of the opinion that the most
correct thing to do is to change it and then fix anything it breaks,
because that really is the correct configuration. But for the purposes
at hand here, I'd rather work around the bug in userland than get
involved in debugging of long-standing and widely scattered kernel
issues exposed by such a change. I could, of course, just change it
and see, but there may be issues like "that would expose random kernel
memory to nonprivileged userland via $THING" which aren't obviously
broken but are clearly undesirable.

Opinions?

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Lloyd Parkes

2012-02-18 21:39:12 UTC

Permalink

Without looking at the code, I would be a bit cautious about code and/or data structures that are being used for more than just IPv4. Copying sin_zero into the IPv4 routing table can only be a bug though.

Cheers,
Lloyd

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2012-02-19 00:03:03 UTC

Permalink

Post by Lloyd Parkes

Without looking at the code, I would be a bit cautious about code
and/or data structures that are being used for more than just IPv4.

Well, the radix tree code used for the routing tables is the same code
for v4, v6, and anything else (is there anything else using it?). All
I was thinking of changing are the parameters set in inetdomain.

Post by Lloyd Parkes
Copying sin_zero into the IPv4 routing table can only be a bug
though.

That was my own reaction too. But I've now tried that change, and,
while the system didn't visibly explode, it also didn't fix the issue;
I still had to pass in zeroed sin_zero in order to get working routes.
I haven't (yet) tracked down why.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Dennis Ferguson

2012-02-19 05:34:48 UTC

Permalink

Post by Mouse
I am tempted to simply change the routing table config to
rn_inithead, 32, 8,
(probably spelling 8 using sizeof and offsetof rather than a literal
constant, probably also changing 32 to an offsetof call like the one
IPv6 uses).
What I'm wondering is, would this break anything? It seems to me like
a no-downside change, but there's a lot of code I don't truly
understand running around there. I'm of the opinion that the most
correct thing to do is to change it and then fix anything it breaks,
because that really is the correct configuration.

You might take a look at the multicast forwarding code to make sure it
isn't doing anything odd with that table (I won't since it gives me a
pain in my stomach to look at that).

Multicast forwarding routes have prefixes up to 64 bits long; the route
data (usually) includes a full destination address and a prefix of the
source address. IPv4 forwarding paths which integrate multicast forwarding
in the "normal" forwarding path generally do route lookups on 8 byte keys
concatenating the destination and source addresses, and the routes you can
install have prefixes up to that length. Unless something else is attempting
to use the IPv4 route table for flow identification (which is a broken way
to do it), multicast forwarding is the only thing I know of for IPv4 requiring
routes with prefixes exceeding 4 bytes.

I suspect, however, that NetBSD's multicast support code was implemented using
the bag-on-the-side approach, so what it needs may not be relevant to the
particular table you are looking at.

Dennis Ferguson
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Motoyuki OHMORI

2012-02-19 07:00:46 UTC

Permalink

Hi,

Post by Mouse
What I'm wondering is, would this break anything? It seems to me like
a no-downside change, but there's a lot of code I don't truly
understand running around there. I'm of the opinion that the most
correct thing to do is to change it and then fix anything it breaks,
because that really is the correct configuration. But for the purposes
at hand here, I'd rather work around the bug in userland than get
involved in debugging of long-standing and widely scattered kernel
issues exposed by such a change. I could, of course, just change it
and see, but there may be issues like "that would expose random kernel
memory to nonprivileged userland via $THING" which aren't obviously
broken but are clearly undesirable.

From theoretical point of view, I agree with you. However, From the practital,
I am wondering whether your modification may break existing routing daemons
such as routed, gated, mrt, zebra (quagga), xorp and so on because they listen
on a routing socket to import kernel routes into them. At a glance, routed
seems to be okay but others are unknown.

BTW, even with your modification, you may still need to properly set sin_len to
not sizeof(struct sockaddr_in) but like offsetof(struct sockaddr_in, sin_zero)
in order to make kernel ignore uninitialized sin_zero. I am afraid that this
would make it difficult to write a portable code because all OSes do not have
non-standard sin_zero field. Just initializing struct sockaddr_in with all
zeros is very portabale, IMHO.

Best regards,
--
Motoyuki OHMORI <***@chikushi-u.ac.jp>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2012-02-19 09:05:30 UTC

Permalink

Post by Motoyuki OHMORI

[changing the AF_INET routing table config to ignore sin_zero]
What I'm wondering is, would this break anything? It seems to me
like a no-downside change, but [ICBW]. I'm of the opinion that the
most correct thing to do is to change it and then fix anything it
breaks, because that really is the correct configuration. But [...]

I have tried the change. Nothing exploded, but I wasn't running any
routing daemons except my own (which is not a routing daemon in the
sense of zebra or routed; the details aren't really relevant here). My
code succeeded in its attempts to install routes and suchlike, but the
problem was still present - the new routes did not get used by packets,
nor by `route get'.

Note that I did not change sizeof(struct sockaddr_in) itself to
actually eliminate sin_zero; all I changed was the radix tree
parameters used for AF_INET.

Post by Motoyuki OHMORI
BTW, even with your modification, you may still need to properly set
sin_len to not sizeof(struct sockaddr_in) but like offsetof(struct
sockaddr_in, sin_zero) in order to make kernel ignore uninitialized
sin_zero.

I tried that; attempts to manipulate the routing table that way
produced errors. I have a fuzzy memory that there's code that checks
that sa_len is "correct" for sa_family; I didn't have the leisure at
the time to check this theory out in detail.

I'm hoping to dig more tomrrow. I may, for example, try eliminating
sin_zero from struct sockaddr_in altogether and see what happens. I
also may dig into the radix tree code more to figure out why the routes
weren't getting used when I changed the config but nothing else.

Post by Motoyuki OHMORI
I am afraid that this would make it difficult to write a portable
code because all OSes do not have non-standard sin_zero field.

There is that; if I make my kernel tolerant of non-zero sin_zero, or if
I eliminate sin_zero entirely, then the need to bzero structs
sockaddr_in goes away, leading to code being written that doesn't.

I'm not entirely sure how I feel about that. There are other cases
where I've made changes and have ended up thinking them worth the
nonportability they bring (AF_TIMER sockets and labeled control
structure come to mind immediately); I don't know which way I'd decide
about this, if/when I get it working.

Post by Motoyuki OHMORI
Just initializing struct sockaddr_in with all zeros is very
portabale, IMHO.

Certainly. Anything that breaks code written that way is, IMO, a bug;
it doesn't affect fields set explicitly, and anything depending on bits
other than the four advertised members I consider broken. That's why I
want to eliminate the dependency on sin_zero now; I consider the
current behaviour, where things work or fail depending on sin_zero,
broken. I just propose to render such initialization unnecessary, not
actively harmful, and even that much only in my experimental kernels
for the moment. :-)

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Darren Reed

2012-02-19 16:29:00 UTC

Permalink

Let me translate your email...

You want the kernel to support sin_zero being non-zero
so that you don't have to initialise a field that should
be zero to zero.

Would you also like to rename sin_zero to sin_garbage?

Darren

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2012-02-19 17:27:06 UTC

Permalink

Post by Darren Reed
Let me translate your email...
You want the kernel to support sin_zero being non-zero so that you
don't have to initialise a field that should be zero to zero.

Well, I would say, rather, so that unnecessary bits are ignored, or not
present at all, rather than being accepted and used in undocumented,
arcane, and difficult-to-detect ways. (The stock userland tools like
netstat don't print those bits even if they're nonzero, or at least I
haven't found one that does.)

It doesn't help that inet(4), the one manpage I have managed to find
that describes sockaddr_in at all, doesn't describe the semantics of
_any_ of the fields. (That page is known to lag reality, though; the
1.4T version does not mention sin_len, and says sin_family is short,
even though the first two bytes had been split for quite some time by
that point.)

Post by Darren Reed
Would you also like to rename sin_zero to sin_garbage?

If I have to keep it, yes, or perhaps sin_padding. But I'd rather
remove it entirely; the only excuse for it I can see is to make
sockaddr_in no smaller than sockaddr, and I don't see what the value in
that is. ohmori said that not all OSes have it, so it's something
portable code (FSVO "portable") cannot assume the existence of.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de