Discussion:
Plan for improving IP_PKTINFO socket option handling
(too old to reply)
Tom Ivar Helbekkmo
2017-12-28 16:15:27 UTC
Permalink
I'd like to make some changes to the IPv4 socket option handling.
Specifically, I want to change how the IP_PKTINFO options are handled.
Before I attempt to change any code, I'd like input on the plan.

First, a bit of background.

I've been looking at getting the PowerDNS applications (authoritative
name server, recursive name server, and DNS load balancer/firewall) to
compile cleanly on NetBSD, and while I've been able to do so, it took
some ugly workarounds. Digging into the standards, the source code,
and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
as a partner to the traditional IP_RECVDSTADDR), I find that there are
a number of differences, some for no good reason at all. In a couple
of cases, our code is just wrong. Also, our documentation of these
options is unclear, and contains errors.

The reason these things exist at all is to enable the owner of a
wildcard bound socket to find out which interface and address an
incoming connection was actually received by, and, in the case of a
UDP socket, to set the source address of an outgoing packet, typically
so that the sender of a UDP request can recognize the response. For
ease of use, recvmsg() delivers the extra information as a control
message which may then be supplied unchanged to sendmsg() when sending
the response, setting the source address to the original destination.

The IPv4 implementation of the *PKTINFO options is not standardized.
It has been implemented several times, modeled, with varying degrees
of accuracy, on the IPv6 version, standardized by RFC3542.

Here's a summary of the IPv6 functionality:

Option IPV6_RECVPKTINFO on socket:
recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets

Option IPV6_PKTINFO on socket:
sets the default source address to be used when sending packets

Control message IPV6_PKTINFO from recvmsg():
contains an in6_pktinfo structure with the specific destination address

Control message IPV6_PKTINFO to sendmsg():
supply an in6_pktinfo structure with the source address to be used

All of these work the same way on BSD, Solaris, and Linux (as per
RFC3542). The in6_pktinfo structure holds the address (in ipi6_addr),
and the interface index (ipi6_ifindex).

Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
control messages with incoming packets, while the IPV6_PKTINFO option
sets a default source address for the socket, and the IPV6_PKTINFO
control message on an outgoing packet sets the source address for that
particular packet.

Now to the IPv4 implementation. In Solaris, this was done as a direct
translation of the IPv6 option set:

Option IP_RECVPKTINFO on socket:
recvmsg() will supply IP_PKTINFO cmsgs for incoming packets

Option IP_PKTINFO on socket:
sets the default source address to be used when sending packets

Control message IP_PKTINFO from recvmsg():
contains an in_pktinfo structure with the specific destination address

Control message IP_PKTINFO to sendmsg():
supply an in_pktinfo structure with the source address to be used

Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
instead using the IP_PKTINFO option to control the delivery of
IP_PKTINFO control messages with incoming packets. In doing so, they
lost the ability to set a default outgoing source address. This is
arguably not a great loss, but it does break compatibility with
Solaris, and it gratuitously breaks orthogonality with IPv6.

Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
fields, they decided to add a new field, ipi_spec_dst. The name is
supposed to refer to the "specific destination" described in RFCs 1122
and 1123. They chose to differentiate between the destination address
as supplied in the incoming IP packet itself, and the local address
the packet was, in fact, delivered to (specifically, ipi_spec_dst is
said to be "the destination address of the routing table entry"). For
outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
used as the source address.

The only real example I can think of is where you listen on 0/0, and
receive a packet on the loopback interface, addressed not to
127.0.0.1, but, say, 127.1.2.3. By the documentation, this should
give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
ipi_spec_dst 127.0.0.1. That's not how Linux works, though: it will
set both to 127.1.2.3. Sending a response, if you pass that control
message unchanged to sendmsg(), you'll be sending from 127.1.2.3
(instead of the documented 127.0.0.1, which wouldn't work), and this
may be a hint to why Linux puts the packet header destination in both
fields. On NetBSD, sending to 127.1.2.3 doesn't work at all.

(This is a general difference in the handling of the loopback
interface: if you 'ping 127.1.2.3' on Linux, you get responses from
127.1.2.3. On NetBSD, you get a 'network unreachable' instead.)

Now, on to NetBSD.

We've mostly copied the way things work in Solaris and Linux, but with
a couple of little twists that break source compatibility with both.

First, we don't have the ipi_spec_dst field at all. Since a lot of
source code out there is written with Solaris and/or Linux in mind,
this breaks compatibility at the source level. I don't have a Solaris
system handy for testing, but from what I observe on Linux, and how
its loopback handling differs from NetBSD, as described above, we
could just toss in a "#define ipi_spec_dst ipi_addr" and be good.

Next, we do something really silly with the name IP_RECVPKTINFO.
Remember that this is the option to turn on the generation of
IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
changing the IP_PKTINFO option to do this instead of setting the
default source address for outgoing packets? Well, we've reinstated
the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
control messages containing the *source* addresses of the incoming
packets. This is completely meaningless, as we have that information
in the standard message header from recvmsg() already, so it'll never
be used for this purpose.

What it does do, though, is trick source code that supports the
Solaris IP_RECVPKTINFO option into thinking we work the same way. See
external/bsd/dhcp/dist/common/socket.c for an example of functionality
we're missing. Note how they test for the presence of both symbols
IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
of Solaris is present. Other code I've read checks for IP_PKTINFO
first, and then uses IP_RECVPKTINFO to decide whether to do things the
Solaris or the Linux way. Our use of the latter symbol breaks this.

Finally, here's what I'd like to change:

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>

2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.

3) Remove the superfluous IP_RECVPKTINFO control message.

4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.

5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
their descriptions of these particular options and control messages.

With this, we should have automatic source code compatibility with
pretty much everything, and orthogonality between IPv6 and IPv4.

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2017-12-28 16:27:50 UTC
Permalink
Post by Tom Ivar Helbekkmo
I'd like to make some changes to the IPv4 socket option handling.
Specifically, I want to change how the IP_PKTINFO options are handled.
Before I attempt to change any code, I'd like input on the plan.
First, a bit of background.
I've been looking at getting the PowerDNS applications (authoritative
name server, recursive name server, and DNS load balancer/firewall) to
compile cleanly on NetBSD, and while I've been able to do so, it took
some ugly workarounds. Digging into the standards, the source code,
and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
as a partner to the traditional IP_RECVDSTADDR), I find that there are
a number of differences, some for no good reason at all. In a couple
of cases, our code is just wrong. Also, our documentation of these
options is unclear, and contains errors.
The reason these things exist at all is to enable the owner of a
wildcard bound socket to find out which interface and address an
incoming connection was actually received by, and, in the case of a
UDP socket, to set the source address of an outgoing packet, typically
so that the sender of a UDP request can recognize the response. For
ease of use, recvmsg() delivers the extra information as a control
message which may then be supplied unchanged to sendmsg() when sending
the response, setting the source address to the original destination.
The IPv4 implementation of the *PKTINFO options is not standardized.
It has been implemented several times, modeled, with varying degrees
of accuracy, on the IPv6 version, standardized by RFC3542.
recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
sets the default source address to be used when sending packets
contains an in6_pktinfo structure with the specific destination address
supply an in6_pktinfo structure with the source address to be used
All of these work the same way on BSD, Solaris, and Linux (as per
RFC3542). The in6_pktinfo structure holds the address (in ipi6_addr),
and the interface index (ipi6_ifindex).
Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
control messages with incoming packets, while the IPV6_PKTINFO option
sets a default source address for the socket, and the IPV6_PKTINFO
control message on an outgoing packet sets the source address for that
particular packet.
Now to the IPv4 implementation. In Solaris, this was done as a direct
recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
sets the default source address to be used when sending packets
contains an in_pktinfo structure with the specific destination address
supply an in_pktinfo structure with the source address to be used
Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
instead using the IP_PKTINFO option to control the delivery of
IP_PKTINFO control messages with incoming packets. In doing so, they
lost the ability to set a default outgoing source address. This is
arguably not a great loss, but it does break compatibility with
Solaris, and it gratuitously breaks orthogonality with IPv6.
Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
fields, they decided to add a new field, ipi_spec_dst. The name is
supposed to refer to the "specific destination" described in RFCs 1122
and 1123. They chose to differentiate between the destination address
as supplied in the incoming IP packet itself, and the local address
the packet was, in fact, delivered to (specifically, ipi_spec_dst is
said to be "the destination address of the routing table entry"). For
outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
used as the source address.
The only real example I can think of is where you listen on 0/0, and
receive a packet on the loopback interface, addressed not to
127.0.0.1, but, say, 127.1.2.3. By the documentation, this should
give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
ipi_spec_dst 127.0.0.1. That's not how Linux works, though: it will
set both to 127.1.2.3. Sending a response, if you pass that control
message unchanged to sendmsg(), you'll be sending from 127.1.2.3
(instead of the documented 127.0.0.1, which wouldn't work), and this
may be a hint to why Linux puts the packet header destination in both
fields. On NetBSD, sending to 127.1.2.3 doesn't work at all.
(This is a general difference in the handling of the loopback
interface: if you 'ping 127.1.2.3' on Linux, you get responses from
127.1.2.3. On NetBSD, you get a 'network unreachable' instead.)
Now, on to NetBSD.
We've mostly copied the way things work in Solaris and Linux, but with
a couple of little twists that break source compatibility with both.
First, we don't have the ipi_spec_dst field at all. Since a lot of
source code out there is written with Solaris and/or Linux in mind,
this breaks compatibility at the source level. I don't have a Solaris
system handy for testing, but from what I observe on Linux, and how
its loopback handling differs from NetBSD, as described above, we
could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
Next, we do something really silly with the name IP_RECVPKTINFO.
Remember that this is the option to turn on the generation of
IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
changing the IP_PKTINFO option to do this instead of setting the
default source address for outgoing packets? Well, we've reinstated
the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
control messages containing the *source* addresses of the incoming
packets. This is completely meaningless, as we have that information
in the standard message header from recvmsg() already, so it'll never
be used for this purpose.
What it does do, though, is trick source code that supports the
Solaris IP_RECVPKTINFO option into thinking we work the same way. See
external/bsd/dhcp/dist/common/socket.c for an example of functionality
we're missing. Note how they test for the presence of both symbols
IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
of Solaris is present. Other code I've read checks for IP_PKTINFO
first, and then uses IP_RECVPKTINFO to decide whether to do things the
Solaris or the Linux way. Our use of the latter symbol breaks this.
1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
their descriptions of these particular options and control messages.
With this, we should have automatic source code compatibility with
pretty much everything, and orthogonality between IPv6 and IPv4.
I like and I support this proposal.

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2017-12-28 19:38:00 UTC
Permalink
Date: Thu, 28 Dec 2017 16:27:50 +0000 (UTC)
From: ***@astron.com (Christos Zoulas)
Message-ID: <p23625$lrh$***@blaine.gmane.org>

| In article <***@thuvia.hamartun.priv.no>,
| Tom Ivar Helbekkmo <***@hamartun.priv.no> wrote:
| >I'd like to make some changes to the IPv4 socket option handling.
es.
[...]
| >With this, we should have automatic source code compatibility with
| >pretty much everything, and orthogonality between IPv6 and IPv4.
|
| I like and I support this proposal.

Yes, do it.

kre



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
John Nemeth
2017-12-29 02:29:00 UTC
Permalink
On Dec 28, 4:27pm, Christos Zoulas wrote:
} Subject: Re: Plan for improving IP_PKTINFO socket option handling
} In article <***@thuvia.hamartun.priv.no>,
} Tom Ivar Helbekkmo <***@hamartun.priv.no> wrote:
} >I'd like to make some changes to the IPv4 socket option handling.
} >Specifically, I want to change how the IP_PKTINFO options are handled.
} >Before I attempt to change any code, I'd like input on the plan.
} >
} >First, a bit of background.
} >
} >I've been looking at getting the PowerDNS applications (authoritative
} >name server, recursive name server, and DNS load balancer/firewall) to
} >compile cleanly on NetBSD, and while I've been able to do so, it took
} >some ugly workarounds. Digging into the standards, the source code,
} >and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
} >doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
} >as a partner to the traditional IP_RECVDSTADDR), I find that there are
} >a number of differences, some for no good reason at all. In a couple
} >of cases, our code is just wrong. Also, our documentation of these
} >options is unclear, and contains errors.
} >
} >The reason these things exist at all is to enable the owner of a
} >wildcard bound socket to find out which interface and address an
} >incoming connection was actually received by, and, in the case of a
} >UDP socket, to set the source address of an outgoing packet, typically
} >so that the sender of a UDP request can recognize the response. For
} >ease of use, recvmsg() delivers the extra information as a control
} >message which may then be supplied unchanged to sendmsg() when sending
} >the response, setting the source address to the original destination.
} >
} >The IPv4 implementation of the *PKTINFO options is not standardized.
} >It has been implemented several times, modeled, with varying degrees
} >of accuracy, on the IPv6 version, standardized by RFC3542.
} >
} >Here's a summary of the IPv6 functionality:
} >
} >Option IPV6_RECVPKTINFO on socket:
} > recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
} >
} >Option IPV6_PKTINFO on socket:
} > sets the default source address to be used when sending packets
} >
} >Control message IPV6_PKTINFO from recvmsg():
} > contains an in6_pktinfo structure with the specific destination address
} >
} >Control message IPV6_PKTINFO to sendmsg():
} > supply an in6_pktinfo structure with the source address to be used
} >
} >All of these work the same way on BSD, Solaris, and Linux (as per
} >RFC3542). The in6_pktinfo structure holds the address (in ipi6_addr),
} >and the interface index (ipi6_ifindex).
} >
} >Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
} >control messages with incoming packets, while the IPV6_PKTINFO option
} >sets a default source address for the socket, and the IPV6_PKTINFO
} >control message on an outgoing packet sets the source address for that
} >particular packet.
} >
} >Now to the IPv4 implementation. In Solaris, this was done as a direct
} >translation of the IPv6 option set:
} >
} >Option IP_RECVPKTINFO on socket:
} > recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
} >
} >Option IP_PKTINFO on socket:
} > sets the default source address to be used when sending packets
} >
} >Control message IP_PKTINFO from recvmsg():
} > contains an in_pktinfo structure with the specific destination address
} >
} >Control message IP_PKTINFO to sendmsg():
} > supply an in_pktinfo structure with the source address to be used
} >
} >Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
} >instead using the IP_PKTINFO option to control the delivery of
} >IP_PKTINFO control messages with incoming packets. In doing so, they
} >lost the ability to set a default outgoing source address. This is
} >arguably not a great loss, but it does break compatibility with
} >Solaris, and it gratuitously breaks orthogonality with IPv6.
} >
} >Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
} >fields, they decided to add a new field, ipi_spec_dst. The name is
} >supposed to refer to the "specific destination" described in RFCs 1122
} >and 1123. They chose to differentiate between the destination address
} >as supplied in the incoming IP packet itself, and the local address
} >the packet was, in fact, delivered to (specifically, ipi_spec_dst is
} >said to be "the destination address of the routing table entry"). For
} >outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
} >used as the source address.
} >
} >The only real example I can think of is where you listen on 0/0, and
} >receive a packet on the loopback interface, addressed not to
} >127.0.0.1, but, say, 127.1.2.3. By the documentation, this should
} >give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
} >ipi_spec_dst 127.0.0.1. That's not how Linux works, though: it will
} >set both to 127.1.2.3. Sending a response, if you pass that control
} >message unchanged to sendmsg(), you'll be sending from 127.1.2.3
} >(instead of the documented 127.0.0.1, which wouldn't work), and this
} >may be a hint to why Linux puts the packet header destination in both
} >fields. On NetBSD, sending to 127.1.2.3 doesn't work at all.
} >
} >(This is a general difference in the handling of the loopback
} >interface: if you 'ping 127.1.2.3' on Linux, you get responses from
} >127.1.2.3. On NetBSD, you get a 'network unreachable' instead.)
} >
} >Now, on to NetBSD.
} >
} >We've mostly copied the way things work in Solaris and Linux, but with
} >a couple of little twists that break source compatibility with both.
} >
} >First, we don't have the ipi_spec_dst field at all. Since a lot of
} >source code out there is written with Solaris and/or Linux in mind,
} >this breaks compatibility at the source level. I don't have a Solaris
} >system handy for testing, but from what I observe on Linux, and how
} >its loopback handling differs from NetBSD, as described above, we
} >could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
} >
} >Next, we do something really silly with the name IP_RECVPKTINFO.
} >Remember that this is the option to turn on the generation of
} >IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
} >changing the IP_PKTINFO option to do this instead of setting the
} >default source address for outgoing packets? Well, we've reinstated
} >the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
} >control messages containing the *source* addresses of the incoming
} >packets. This is completely meaningless, as we have that information
} >in the standard message header from recvmsg() already, so it'll never
} >be used for this purpose.
} >
} >What it does do, though, is trick source code that supports the
} >Solaris IP_RECVPKTINFO option into thinking we work the same way. See
} >external/bsd/dhcp/dist/common/socket.c for an example of functionality
} >we're missing. Note how they test for the presence of both symbols
} >IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
} >of Solaris is present. Other code I've read checks for IP_PKTINFO
} >first, and then uses IP_RECVPKTINFO to decide whether to do things the
} >Solaris or the Linux way. Our use of the latter symbol breaks this.
} >
} >Finally, here's what I'd like to change:
} >
} >1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
} >
} >2) Change the IP_RECVPKTINFO option to control the generation of
} > IP_PKTINFO control messages, the way it's done in Solaris.
} >
} >3) Remove the superfluous IP_RECVPKTINFO control message.
} >
} >4) Change the IP_PKTINFO option to do different things depending on
} > the parameter it's supplied with:
} > - If it's sizeof(int), assume it's being used as in Linux:
} > - If it's non-zero, turn on the IP_RECVPKTINFO option.
} > - If it's zero, turn off the IP_RECVPKTINFO option.
} > - If it's sizeof(struct in_pktinfo), assume it's being used as in
} > Solaris, to set a default for the source interface and/or
} > source address for outgoing packets on the socket.
} >
} >5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
} > their descriptions of these particular options and control messages.
} >
} >With this, we should have automatic source code compatibility with
} >pretty much everything, and orthogonality between IPv6 and IPv4.
}
} I like and I support this proposal.

For what it's worth, me too. :-) The lack of source code
compatibility has really been annoying me when working on some
packages. Also, tftpd not sending packets from the correct source
address has been a problem (this may have been fixed in the mean
time). Also, good work Tom with the research!

}-- End of excerpt from Christos Zoulas

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alistair Crooks
2017-12-29 18:30:37 UTC
Permalink
Just to add to the general approval - nice analysis, would be great to
have this!

Thanks,
Alistair
Post by John Nemeth
} Subject: Re: Plan for improving IP_PKTINFO socket option handling
} >I'd like to make some changes to the IPv4 socket option handling.
} >Specifically, I want to change how the IP_PKTINFO options are handled.
} >Before I attempt to change any code, I'd like input on the plan.
} >
} >First, a bit of background.
} >
} >I've been looking at getting the PowerDNS applications (authoritative
} >name server, recursive name server, and DNS load balancer/firewall) to
} >compile cleanly on NetBSD, and while I've been able to do so, it took
} >some ugly workarounds. Digging into the standards, the source code,
} >and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
} >doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
} >as a partner to the traditional IP_RECVDSTADDR), I find that there are
} >a number of differences, some for no good reason at all. In a couple
} >of cases, our code is just wrong. Also, our documentation of these
} >options is unclear, and contains errors.
} >
} >The reason these things exist at all is to enable the owner of a
} >wildcard bound socket to find out which interface and address an
} >incoming connection was actually received by, and, in the case of a
} >UDP socket, to set the source address of an outgoing packet, typically
} >so that the sender of a UDP request can recognize the response. For
} >ease of use, recvmsg() delivers the extra information as a control
} >message which may then be supplied unchanged to sendmsg() when sending
} >the response, setting the source address to the original destination.
} >
} >The IPv4 implementation of the *PKTINFO options is not standardized.
} >It has been implemented several times, modeled, with varying degrees
} >of accuracy, on the IPv6 version, standardized by RFC3542.
} >
} >
} > recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
} >
} > sets the default source address to be used when sending packets
} >
} > contains an in6_pktinfo structure with the specific destination address
} >
} > supply an in6_pktinfo structure with the source address to be used
} >
} >All of these work the same way on BSD, Solaris, and Linux (as per
} >RFC3542). The in6_pktinfo structure holds the address (in ipi6_addr),
} >and the interface index (ipi6_ifindex).
} >
} >Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
} >control messages with incoming packets, while the IPV6_PKTINFO option
} >sets a default source address for the socket, and the IPV6_PKTINFO
} >control message on an outgoing packet sets the source address for that
} >particular packet.
} >
} >Now to the IPv4 implementation. In Solaris, this was done as a direct
} >
} > recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
} >
} > sets the default source address to be used when sending packets
} >
} > contains an in_pktinfo structure with the specific destination address
} >
} > supply an in_pktinfo structure with the source address to be used
} >
} >Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
} >instead using the IP_PKTINFO option to control the delivery of
} >IP_PKTINFO control messages with incoming packets. In doing so, they
} >lost the ability to set a default outgoing source address. This is
} >arguably not a great loss, but it does break compatibility with
} >Solaris, and it gratuitously breaks orthogonality with IPv6.
} >
} >Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
} >fields, they decided to add a new field, ipi_spec_dst. The name is
} >supposed to refer to the "specific destination" described in RFCs 1122
} >and 1123. They chose to differentiate between the destination address
} >as supplied in the incoming IP packet itself, and the local address
} >the packet was, in fact, delivered to (specifically, ipi_spec_dst is
} >said to be "the destination address of the routing table entry"). For
} >outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
} >used as the source address.
} >
} >The only real example I can think of is where you listen on 0/0, and
} >receive a packet on the loopback interface, addressed not to
} >127.0.0.1, but, say, 127.1.2.3. By the documentation, this should
} >give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
} >ipi_spec_dst 127.0.0.1. That's not how Linux works, though: it will
} >set both to 127.1.2.3. Sending a response, if you pass that control
} >message unchanged to sendmsg(), you'll be sending from 127.1.2.3
} >(instead of the documented 127.0.0.1, which wouldn't work), and this
} >may be a hint to why Linux puts the packet header destination in both
} >fields. On NetBSD, sending to 127.1.2.3 doesn't work at all.
} >
} >(This is a general difference in the handling of the loopback
} >interface: if you 'ping 127.1.2.3' on Linux, you get responses from
} >127.1.2.3. On NetBSD, you get a 'network unreachable' instead.)
} >
} >Now, on to NetBSD.
} >
} >We've mostly copied the way things work in Solaris and Linux, but with
} >a couple of little twists that break source compatibility with both.
} >
} >First, we don't have the ipi_spec_dst field at all. Since a lot of
} >source code out there is written with Solaris and/or Linux in mind,
} >this breaks compatibility at the source level. I don't have a Solaris
} >system handy for testing, but from what I observe on Linux, and how
} >its loopback handling differs from NetBSD, as described above, we
} >could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
} >
} >Next, we do something really silly with the name IP_RECVPKTINFO.
} >Remember that this is the option to turn on the generation of
} >IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
} >changing the IP_PKTINFO option to do this instead of setting the
} >default source address for outgoing packets? Well, we've reinstated
} >the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
} >control messages containing the *source* addresses of the incoming
} >packets. This is completely meaningless, as we have that information
} >in the standard message header from recvmsg() already, so it'll never
} >be used for this purpose.
} >
} >What it does do, though, is trick source code that supports the
} >Solaris IP_RECVPKTINFO option into thinking we work the same way. See
} >external/bsd/dhcp/dist/common/socket.c for an example of functionality
} >we're missing. Note how they test for the presence of both symbols
} >IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
} >of Solaris is present. Other code I've read checks for IP_PKTINFO
} >first, and then uses IP_RECVPKTINFO to decide whether to do things the
} >Solaris or the Linux way. Our use of the latter symbol breaks this.
} >
} >
} >1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
} >
} >2) Change the IP_RECVPKTINFO option to control the generation of
} > IP_PKTINFO control messages, the way it's done in Solaris.
} >
} >3) Remove the superfluous IP_RECVPKTINFO control message.
} >
} >4) Change the IP_PKTINFO option to do different things depending on
} > - If it's non-zero, turn on the IP_RECVPKTINFO option.
} > - If it's zero, turn off the IP_RECVPKTINFO option.
} > - If it's sizeof(struct in_pktinfo), assume it's being used as in
} > Solaris, to set a default for the source interface and/or
} > source address for outgoing packets on the socket.
} >
} >5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
} > their descriptions of these particular options and control messages.
} >
} >With this, we should have automatic source code compatibility with
} >pretty much everything, and orthogonality between IPv6 and IPv4.
}
} I like and I support this proposal.
For what it's worth, me too. :-) The lack of source code
compatibility has really been annoying me when working on some
packages. Also, tftpd not sending packets from the correct source
address has been a problem (this may have been fixed in the mean
time). Also, good work Tom with the research!
}-- End of excerpt from Christos Zoulas
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2017-12-31 15:05:33 UTC
Permalink
I've got everything working, now, and ready for others to take a look
at. I've tested all functionality, and am quite happy with it. I'm
attaching the patch for perusal and criticism.

I gave in to a little bit of feature creep: I included source level
compatibility with FreeBSD, as adding the IP_SENDSRCADDR control message
was so quick and easy to do while I was in there anyway.

Binary compatibility with existing applications is maintained: I've
verified that existing compiled binaries continue to work as before.

There are a couple of little things I'd like opinions on, though. More
on that below.
Post by Tom Ivar Helbekkmo
1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
Done. This alone makes lots of software compile that didn't.
Post by Tom Ivar Helbekkmo
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
Done. Using IP_PKTINFO will still work, though -- see below.
Post by Tom Ivar Helbekkmo
3) Remove the superfluous IP_RECVPKTINFO control message.
Done. It won't be missed -- nothing has ever used it.
Post by Tom Ivar Helbekkmo
4) Change the IP_PKTINFO option to do different things depending on
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
Done, but only for setsockopt(2). The relevant kernel code knows the
size of the data supplied, and can differentiate as described, and I do
that. For getsockopt(2), however, it's not so simple. The size is not
known (sopt->sopt_size is 0 during the IP socket option processing in
ip_ctloutput() in sys/netinet/ip_output.c, which I'm tempted to call a
bug), and thus I have no idea whether the caller expects an int or a
struct in_pktinfo returned. For now, I've punted, and just return the
int that Linux source code, and existing NetBSD applications, expect.
Comments and suggestions are welcome.

Another thing I'm unsure of here: when the application uses IP_PKTINFO
to set a default source address for outgoing packets through a wildcard
bound socket, I need a place to keep that address. I decided to add a
new member to the PCB structure for this purpose. If that's stupid, and
there's something else I should have done, someone please enlighten me.
Post by Tom Ivar Helbekkmo
5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
their descriptions of these particular options and control messages.
Done, but only for ip(4). I still think ip6(4) needs to be looked at,
but after perusing sys/netinet6/ip6_output.c, I'm not sure exactly how
to go about it. Someone with more knowledge of that code is needed.

6) Add the IP_SENDSRCADDR control message, for FreeBSD compatibility.

Done, too. :)
Post by Tom Ivar Helbekkmo
With this, we should have automatic source code compatibility with
pretty much everything, and orthogonality between IPv6 and IPv4.
I'm pretty happy with the result: software written for FreeBSD, Linux,
Solaris, and probably a few others, like AIX, should now compile and run
with no changes to these particular bits.

-tih
Christos Zoulas
2017-12-31 15:36:51 UTC
Permalink
-=-=-=-=-=-
I've got everything working, now, and ready for others to take a look
at. I've tested all functionality, and am quite happy with it. I'm
attaching the patch for perusal and criticism.
I gave in to a little bit of feature creep: I included source level
compatibility with FreeBSD, as adding the IP_SENDSRCADDR control message
was so quick and easy to do while I was in there anyway.
Binary compatibility with existing applications is maintained: I've
verified that existing compiled binaries continue to work as before.
There are a couple of little things I'd like opinions on, though. More
on that below.
Post by Tom Ivar Helbekkmo
1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
Done. This alone makes lots of software compile that didn't.
Post by Tom Ivar Helbekkmo
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
Done. Using IP_PKTINFO will still work, though -- see below.
Post by Tom Ivar Helbekkmo
3) Remove the superfluous IP_RECVPKTINFO control message.
Done. It won't be missed -- nothing has ever used it.
Post by Tom Ivar Helbekkmo
4) Change the IP_PKTINFO option to do different things depending on
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
Done, but only for setsockopt(2). The relevant kernel code knows the
size of the data supplied, and can differentiate as described, and I do
that. For getsockopt(2), however, it's not so simple. The size is not
known (sopt->sopt_size is 0 during the IP socket option processing in
ip_ctloutput() in sys/netinet/ip_output.c, which I'm tempted to call a
bug), and thus I have no idea whether the caller expects an int or a
struct in_pktinfo returned. For now, I've punted, and just return the
int that Linux source code, and existing NetBSD applications, expect.
Comments and suggestions are welcome.
This came up in a different discussion; we should pass the size around...
Another thing I'm unsure of here: when the application uses IP_PKTINFO
to set a default source address for outgoing packets through a wildcard
bound socket, I need a place to keep that address. I decided to add a
new member to the PCB structure for this purpose. If that's stupid, and
there's something else I should have done, someone please enlighten me.
I guess that's fine for now.


Great, minor nits :-)

+ error = sockopt_get(sopt, &pktinfo, sizeof(struct in_pktinfo));
+ if (!error) {

Early break here:
if (error)
break;

To avoid indent...

+ /* Solaris compatibility */
+ if (pktinfo.ipi_ifindex) {
+ struct ifnet *ifp;
+ struct in_ifaddr *ia;
@@ -1449,11 +1487,25 @@

I'd use sizeof(pktinfo) here:

if (cm->cmsg_len != CMSG_LEN(sizeof(struct in_pktinfo)))
return EINVAL;

- pktinfo = (struct in_pktinfo *)CMSG_DATA(cm);
- error = ip_pktinfo_prepare(pktinfo, pktopts, flags,

And write this:
+ memcpy(&pktinfo, (struct in_pktinfo *)CMSG_DATA(cm),
+ sizeof(struct in_pktinfo));

as:
memcpy(&pktinfo, CMSG_DATA(cm), sizeof(pktinfo));

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2017-12-31 17:02:13 UTC
Permalink
Post by Christos Zoulas
This came up in a different discussion; we should pass the size around...
That would be nice. It would also make a lot of sense: after using the
size to detect client expectations while setting options, I spent a lot
of time trying to find out why my code was failing in the getting part;
I really expected the circumstances to be equivalent.
Post by Christos Zoulas
Great, minor nits :-)
[...]
Thanks, Christos -- updated patch appended.

-tih

Index: sys/netinet/in.h
===================================================================
RCS file: /cvsroot/src/sys/netinet/in.h,v
retrieving revision 1.101
diff -u -u -r1.101 in.h
--- sys/netinet/in.h 10 Aug 2017 04:31:58 -0000 1.101
+++ sys/netinet/in.h 31 Dec 2017 16:54:47 -0000
@@ -289,8 +289,10 @@
#define IP_IPSEC_POLICY 22 /* struct; get/set security policy */
#define IP_RECVTTL 23 /* bool; receive IP TTL w/dgram */
#define IP_MINTTL 24 /* minimum TTL for packet or drop */
-#define IP_PKTINFO 25 /* int; send interface and src addr */
-#define IP_RECVPKTINFO 26 /* int; send interface and dst addr */
+#define IP_PKTINFO 25 /* struct; set default src if/addr */
+#define IP_RECVPKTINFO 26 /* int; receive dst if/addr w/dgram */
+
+#define IP_SENDSRCADDR IP_RECVDSTADDR /* FreeBSD compatibility */

/*
* Information sent in the control message of a datagram socket for
@@ -301,6 +303,8 @@
unsigned int ipi_ifindex; /* interface index */
};

+#define ipi_spec_dst ipi_addr /* Solaris/Linux compatibility */
+
/*
* Defaults and limits for options
*/
Index: sys/netinet/in_pcb.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/in_pcb.c,v
retrieving revision 1.180
diff -u -u -r1.180 in_pcb.c
--- sys/netinet/in_pcb.c 15 Dec 2017 04:03:46 -0000 1.180
+++ sys/netinet/in_pcb.c 31 Dec 2017 16:54:47 -0000
@@ -204,6 +204,7 @@
inp->inp_errormtu = -1;
inp->inp_portalgo = PORTALGO_DEFAULT;
inp->inp_bindportonsend = false;
+ inp->inp_prefsrcip.s_addr = INADDR_ANY;
#if defined(IPSEC)
if (ipsec_enabled) {
int error = ipsec_init_pcbpolicy(so, &inp->inp_sp);
Index: sys/netinet/in_pcb.h
===================================================================
RCS file: /cvsroot/src/sys/netinet/in_pcb.h,v
retrieving revision 1.64
diff -u -u -r1.64 in_pcb.h
--- sys/netinet/in_pcb.h 10 Aug 2017 04:31:58 -0000 1.64
+++ sys/netinet/in_pcb.h 31 Dec 2017 16:54:47 -0000
@@ -95,6 +95,7 @@
int inp_errormtu; /* MTU of last xmit status = EMSGSIZE */
uint8_t inp_ip_minttl;
bool inp_bindportonsend;
+ struct in_addr inp_prefsrcip; /* preferred src IP when wild */
};

#define inp_faddr inp_ip.ip_dst
@@ -121,11 +122,9 @@
* Cancels INP_HDRINCL.
*/
#define INP_RECVTTL 0x0800 /* receive incoming IP TTL */
-#define INP_PKTINFO 0x1000 /* receive dst packet info */
-#define INP_RECVPKTINFO 0x2000 /* receive dst packet info */
+#define INP_RECVPKTINFO 0x1000 /* receive IP dst if/addr */
#define INP_CONTROLOPTS (INP_RECVOPTS|INP_RECVRETOPTS|INP_RECVDSTADDR|\
- INP_RECVIF|INP_RECVTTL|INP_RECVPKTINFO|\
- INP_PKTINFO)
+ INP_RECVIF|INP_RECVTTL|INP_RECVPKTINFO)

#define sotoinpcb(so) ((struct inpcb *)(so)->so_pcb)
#define inp_lock(inp) solock((inp)->inp_socket)
Index: sys/netinet/ip_input.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/ip_input.c,v
retrieving revision 1.363
diff -u -u -r1.363 ip_input.c
--- sys/netinet/ip_input.c 24 Nov 2017 14:03:25 -0000 1.363
+++ sys/netinet/ip_input.c 31 Dec 2017 16:54:47 -0000
@@ -1533,15 +1533,6 @@

if (inpflags & INP_RECVPKTINFO) {
struct in_pktinfo ipi;
- ipi.ipi_addr = ip->ip_src;
- ipi.ipi_ifindex = ifp->if_index;
- *mp = sbcreatecontrol(&ipi,
- sizeof(ipi), IP_RECVPKTINFO, IPPROTO_IP);
- if (*mp)
- mp = &(*mp)->m_next;
- }
- if (inpflags & INP_PKTINFO) {
- struct in_pktinfo ipi;
ipi.ipi_addr = ip->ip_dst;
ipi.ipi_ifindex = ifp->if_index;
*mp = sbcreatecontrol(&ipi,
Index: sys/netinet/ip_output.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/ip_output.c,v
retrieving revision 1.288
diff -u -u -r1.288 ip_output.c
--- sys/netinet/ip_output.c 22 Dec 2017 11:22:37 -0000 1.288
+++ sys/netinet/ip_output.c 31 Dec 2017 16:54:48 -0000
@@ -1081,6 +1081,7 @@
struct ip *ip = &inp->inp_ip;
int inpflags = inp->inp_flags;
int optval = 0, error = 0;
+ struct in_pktinfo pktinfo;

KASSERT(solocked(so));

@@ -1103,7 +1104,6 @@
case IP_TOS:
case IP_TTL:
case IP_MINTTL:
- case IP_PKTINFO:
case IP_RECVOPTS:
case IP_RECVRETOPTS:
case IP_RECVDSTADDR:
@@ -1135,10 +1135,6 @@
else \
inpflags &= ~bit;

- case IP_PKTINFO:
- OPTSET(INP_PKTINFO);
- break;
-
case IP_RECVOPTS:
OPTSET(INP_RECVOPTS);
break;
@@ -1163,6 +1159,43 @@
OPTSET(INP_RECVTTL);
break;
}
+ break;
+ case IP_PKTINFO:
+ error = sockopt_getint(sopt, &optval);
+ if (!error) {
+ /* Linux compatibility */
+ OPTSET(INP_RECVPKTINFO);
+ break;
+ }
+ error = sockopt_get(sopt, &pktinfo, sizeof(struct in_pktinfo));
+ if (error)
+ break;
+ /* Solaris compatibility */
+ if (pktinfo.ipi_ifindex) {
+ struct ifnet *ifp;
+ struct in_ifaddr *ia;
+ int s;
+
+ /* pick up primary address */
+ s = pserialize_read_enter();
+ ifp = if_byindex(pktinfo.ipi_ifindex);
+ if (ifp == NULL) {
+ pserialize_read_exit(s);
+ error = EADDRNOTAVAIL;
+ break;
+ }
+ ia = in_get_ia_from_ifp(ifp);
+ if (ia == NULL) {
+ pserialize_read_exit(s);
+ error = EADDRNOTAVAIL;
+ break;
+ }
+ inp->inp_prefsrcip = IA_SIN(ia)->sin_addr;
+ pserialize_read_exit(s);
+ } else {
+ inp->inp_prefsrcip = pktinfo.ipi_addr;
+ }
+ break;
break;
#undef OPTSET

@@ -1270,7 +1303,8 @@
#define OPTBIT(bit) (inpflags & bit ? 1 : 0)

case IP_PKTINFO:
- optval = OPTBIT(INP_PKTINFO);
+ /* Linux compatibility */
+ optval = OPTBIT(INP_RECVPKTINFO);
break;

case IP_RECVOPTS:
@@ -1416,11 +1450,14 @@
struct inpcb *inp, kauth_cred_t cred)
{
struct cmsghdr *cm;
- struct in_pktinfo *pktinfo;
+ struct in_pktinfo pktinfo;
int error;

pktopts->ippo_imo = inp->inp_moptions;
- sockaddr_in_init(&pktopts->ippo_laddr, &inp->inp_laddr, 0);
+ if (!in_nullhost(inp->inp_prefsrcip))
+ sockaddr_in_init(&pktopts->ippo_laddr, &inp->inp_prefsrcip, 0);
+ else
+ sockaddr_in_init(&pktopts->ippo_laddr, &inp->inp_laddr, 0);

if (control == NULL)
return 0;
@@ -1446,13 +1483,22 @@

switch (cm->cmsg_type) {
case IP_PKTINFO:
- if (cm->cmsg_len != CMSG_LEN(sizeof(struct in_pktinfo)))
+ if (cm->cmsg_len != CMSG_LEN(sizeof(pktinfo)))
return EINVAL;
-
- pktinfo = (struct in_pktinfo *)CMSG_DATA(cm);
- error = ip_pktinfo_prepare(pktinfo, pktopts, flags,
+ memcpy(&pktinfo, CMSG_DATA(cm), sizeof(pktinfo));
+ error = ip_pktinfo_prepare(&pktinfo, pktopts, flags,
cred);
- if (error != 0)
+ if (error)
+ return error;
+ break;
+ case IP_SENDSRCADDR: /* FreeBSD compatibility */
+ if (cm->cmsg_len != CMSG_LEN(sizeof(struct in_addr)))
+ return EINVAL;
+ pktinfo.ipi_ifindex = 0;
+ pktinfo.ipi_addr = ((struct in_pktinfo *)CMSG_DATA(cm))->ipi_addr;
+ error = ip_pktinfo_prepare(&pktinfo, pktopts, flags,
+ cred);
+ if (error)
return error;
break;
default:
Index: share/man/man4/ip.4
===================================================================
RCS file: /cvsroot/src/share/man/man4/ip.4,v
retrieving revision 1.40
diff -u -u -r1.40 ip.4
--- share/man/man4/ip.4 13 Aug 2017 18:19:44 -0000 1.40
+++ share/man/man4/ip.4 31 Dec 2017 16:54:50 -0000
@@ -96,8 +96,8 @@
.Ed
.Pp
The
-.Dv IP_PKTINFO
-option can be used to turn on receiving of information about the source
+.Dv IP_RECVPKTINFO
+option can be used to turn on receiving of information about the destination
address of the packet, and the interface index.
The information is passed in a
.Vt struct in_pktinfo
@@ -117,13 +117,24 @@
.Pp
For
.Xr sendmsg 2 ,
-the source address or output interface can be specified by adding
+the source address or output interface can be specified by adding an
.Dv IP_PKTINFO
-to the control part of the message on a
+message to the control part of the message on a
.Dv SOCK_DGRAM
or
.Dv SOCK_RAW
-socket.
+socket. Setting ipi_ifindex will cause the primary address of that
+interface to be used; setting ipi_addr will directly choose that address.
+The IP_PKTINFO cmsghdr structure from a received message may be used
+unchanged, in which case the outgoing message will be sent from the
+address the incoming message was received on.
+.Pp
+Setting the
+.Dv IP_PKTINFO
+option on a socket, with the same
+.Vt struct in_pktinfo
+structure, will set the default source address to be used until set
+again, unless explicitly overridden on a per-packet basis, as above.
.Pp
The
.Dv IP_PORTALGO
@@ -177,6 +188,18 @@
cmsg_type = IP_RECVDSTADDR
.Ed
.Pp
+For
+.Xr sendmsg 2 ,
+the source address can be specified by adding
+.Dv IP_SENDSRCADDR
+to the control part of the message on a
+.Dv SOCK_DGRAM
+or
+.Dv SOCK_RAW
+socket. The IP_RECVDSTADDR cmsghdr structure from a received message
+may be used unchanged, in which case the outgoing message will be sent
+from the address the incoming message was received on.
+.Pp
If the
.Dv IP_RECVIF
option is enabled on a
@@ -197,12 +220,6 @@
cmsg_type = IP_RECVIF
.Ed
.Pp
-The
-.Dv IP_RECVPKTINFO
-option is similar to the
-.Dv IP_PKTINFO
-one, only in this case the inbound information is returned.
-.Pp
If the
.Dv IP_RECVTTL
option is enabled on a
@@ -452,6 +469,24 @@
the IP option field was improperly formed; an option field was
shorter than the minimum value or longer than the option buffer provided.
.El
+.Sh COMPATIBILITY
+The
+.Dv IP_RECVPKTINFO
+option is used because it is directly compatible with Solaris, AIX, etc.,
+and the
+.Dv IP_PKTINFO
+option is intended to be used in their manner, to set the default source
+address for outgoing packets on a
+.Dv SOCK_DGRAM
+or
+.Dv SOCK_RAW
+socket. For compatibility with Linux, however, if you attempt to set the
+.Dv IP_PKTINFO
+option, using an integer parameter as a boolean value, this will
+transparently manipulate the
+.Dv IP_RECVPKTINFO
+option instead. Source code compatbility with both environments is thus
+maintained.
.Sh SEE ALSO
.Xr getsockopt 2 ,
.Xr recv 2 ,
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2017-12-31 19:09:49 UTC
Permalink
Post by Tom Ivar Helbekkmo
Post by Christos Zoulas
This came up in a different discussion; we should pass the size around...
That would be nice. It would also make a lot of sense: after using the
size to detect client expectations while setting options, I spent a lot
of time trying to find out why my code was failing in the getting part;
I really expected the circumstances to be equivalent.
Thinking about it, I put back the code that returns what Linux or
Solaris compatible code expects, depending on data size, and just added
a fallback to a Linux (and current NetBSD) compatible value if the size
is unknown (as it is now), or, in the future, if the calling application
specifies a receiving buffer that doesn't match either data item.

Look for the "XXX" below to see it.

-tih

Index: sys/netinet/in.h
===================================================================
RCS file: /cvsroot/src/sys/netinet/in.h,v
retrieving revision 1.101
diff -u -u -r1.101 in.h
--- sys/netinet/in.h 10 Aug 2017 04:31:58 -0000 1.101
+++ sys/netinet/in.h 31 Dec 2017 19:06:46 -0000
@@ -289,8 +289,10 @@
#define IP_IPSEC_POLICY 22 /* struct; get/set security policy */
#define IP_RECVTTL 23 /* bool; receive IP TTL w/dgram */
#define IP_MINTTL 24 /* minimum TTL for packet or drop */
-#define IP_PKTINFO 25 /* int; send interface and src addr */
-#define IP_RECVPKTINFO 26 /* int; send interface and dst addr */
+#define IP_PKTINFO 25 /* struct; set default src if/addr */
+#define IP_RECVPKTINFO 26 /* int; receive dst if/addr w/dgram */
+
+#define IP_SENDSRCADDR IP_RECVDSTADDR /* FreeBSD compatibility */

/*
* Information sent in the control message of a datagram socket for
@@ -301,6 +303,8 @@
unsigned int ipi_ifindex; /* interface index */
};

+#define ipi_spec_dst ipi_addr /* Solaris/Linux compatibility */
+
/*
* Defaults and limits for options
*/
Index: sys/netinet/in_pcb.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/in_pcb.c,v
retrieving revision 1.180
diff -u -u -r1.180 in_pcb.c
--- sys/netinet/in_pcb.c 15 Dec 2017 04:03:46 -0000 1.180
+++ sys/netinet/in_pcb.c 31 Dec 2017 19:06:46 -0000
@@ -204,6 +204,7 @@
inp->inp_errormtu = -1;
inp->inp_portalgo = PORTALGO_DEFAULT;
inp->inp_bindportonsend = false;
+ inp->inp_prefsrcip.s_addr = INADDR_ANY;
#if defined(IPSEC)
if (ipsec_enabled) {
int error = ipsec_init_pcbpolicy(so, &inp->inp_sp);
Index: sys/netinet/in_pcb.h
===================================================================
RCS file: /cvsroot/src/sys/netinet/in_pcb.h,v
retrieving revision 1.64
diff -u -u -r1.64 in_pcb.h
--- sys/netinet/in_pcb.h 10 Aug 2017 04:31:58 -0000 1.64
+++ sys/netinet/in_pcb.h 31 Dec 2017 19:06:46 -0000
@@ -95,6 +95,7 @@
int inp_errormtu; /* MTU of last xmit status = EMSGSIZE */
uint8_t inp_ip_minttl;
bool inp_bindportonsend;
+ struct in_addr inp_prefsrcip; /* preferred src IP when wild */
};

#define inp_faddr inp_ip.ip_dst
@@ -121,11 +122,9 @@
* Cancels INP_HDRINCL.
*/
#define INP_RECVTTL 0x0800 /* receive incoming IP TTL */
-#define INP_PKTINFO 0x1000 /* receive dst packet info */
-#define INP_RECVPKTINFO 0x2000 /* receive dst packet info */
+#define INP_RECVPKTINFO 0x1000 /* receive IP dst if/addr */
#define INP_CONTROLOPTS (INP_RECVOPTS|INP_RECVRETOPTS|INP_RECVDSTADDR|\
- INP_RECVIF|INP_RECVTTL|INP_RECVPKTINFO|\
- INP_PKTINFO)
+ INP_RECVIF|INP_RECVTTL|INP_RECVPKTINFO)

#define sotoinpcb(so) ((struct inpcb *)(so)->so_pcb)
#define inp_lock(inp) solock((inp)->inp_socket)
Index: sys/netinet/ip_input.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/ip_input.c,v
retrieving revision 1.363
diff -u -u -r1.363 ip_input.c
--- sys/netinet/ip_input.c 24 Nov 2017 14:03:25 -0000 1.363
+++ sys/netinet/ip_input.c 31 Dec 2017 19:06:46 -0000
@@ -1533,15 +1533,6 @@

if (inpflags & INP_RECVPKTINFO) {
struct in_pktinfo ipi;
- ipi.ipi_addr = ip->ip_src;
- ipi.ipi_ifindex = ifp->if_index;
- *mp = sbcreatecontrol(&ipi,
- sizeof(ipi), IP_RECVPKTINFO, IPPROTO_IP);
- if (*mp)
- mp = &(*mp)->m_next;
- }
- if (inpflags & INP_PKTINFO) {
- struct in_pktinfo ipi;
ipi.ipi_addr = ip->ip_dst;
ipi.ipi_ifindex = ifp->if_index;
*mp = sbcreatecontrol(&ipi,
Index: sys/netinet/ip_output.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/ip_output.c,v
retrieving revision 1.288
diff -u -u -r1.288 ip_output.c
--- sys/netinet/ip_output.c 22 Dec 2017 11:22:37 -0000 1.288
+++ sys/netinet/ip_output.c 31 Dec 2017 19:06:46 -0000
@@ -1081,6 +1081,7 @@
struct ip *ip = &inp->inp_ip;
int inpflags = inp->inp_flags;
int optval = 0, error = 0;
+ struct in_pktinfo pktinfo;

KASSERT(solocked(so));

@@ -1103,7 +1104,6 @@
case IP_TOS:
case IP_TTL:
case IP_MINTTL:
- case IP_PKTINFO:
case IP_RECVOPTS:
case IP_RECVRETOPTS:
case IP_RECVDSTADDR:
@@ -1135,10 +1135,6 @@
else \
inpflags &= ~bit;

- case IP_PKTINFO:
- OPTSET(INP_PKTINFO);
- break;
-
case IP_RECVOPTS:
OPTSET(INP_RECVOPTS);
break;
@@ -1163,6 +1159,43 @@
OPTSET(INP_RECVTTL);
break;
}
+ break;
+ case IP_PKTINFO:
+ error = sockopt_getint(sopt, &optval);
+ if (!error) {
+ /* Linux compatibility */
+ OPTSET(INP_RECVPKTINFO);
+ break;
+ }
+ error = sockopt_get(sopt, &pktinfo, sizeof(struct in_pktinfo));
+ if (error)
+ break;
+ /* Solaris compatibility */
+ if (pktinfo.ipi_ifindex) {
+ struct ifnet *ifp;
+ struct in_ifaddr *ia;
+ int s;
+
+ /* pick up primary address */
+ s = pserialize_read_enter();
+ ifp = if_byindex(pktinfo.ipi_ifindex);
+ if (ifp == NULL) {
+ pserialize_read_exit(s);
+ error = EADDRNOTAVAIL;
+ break;
+ }
+ ia = in_get_ia_from_ifp(ifp);
+ if (ia == NULL) {
+ pserialize_read_exit(s);
+ error = EADDRNOTAVAIL;
+ break;
+ }
+ inp->inp_prefsrcip = IA_SIN(ia)->sin_addr;
+ pserialize_read_exit(s);
+ } else {
+ inp->inp_prefsrcip = pktinfo.ipi_addr;
+ }
+ break;
break;
#undef OPTSET

@@ -1239,7 +1272,6 @@
}
break;
}
- case IP_PKTINFO:
case IP_TOS:
case IP_TTL:
case IP_MINTTL:
@@ -1269,10 +1301,6 @@

#define OPTBIT(bit) (inpflags & bit ? 1 : 0)

- case IP_PKTINFO:
- optval = OPTBIT(INP_PKTINFO);
- break;
-
case IP_RECVOPTS:
optval = OPTBIT(INP_RECVOPTS);
break;
@@ -1300,6 +1328,28 @@
error = sockopt_setint(sopt, optval);
break;

+ case IP_PKTINFO:
+ /* XXX these tests fail until size gets propagated */
+ /* It needs to be passed through from the caller */
+ if (sopt->sopt_size == sizeof(int)) {
+ /* Linux compatibility */
+ optval = OPTBIT(INP_RECVPKTINFO);
+ error = sockopt_setint(sopt, optval);
+ } else if (sopt->sopt_size == sizeof(struct in_pktinfo)) {
+ /* Solaris compatibility */
+ struct in_pktinfo ipiopt;
+ ipiopt.ipi_ifindex = 0;
+ ipiopt.ipi_addr = inp->inp_prefsrcip;
+ error = sockopt_set(sopt, &ipiopt, sizeof(ipiopt));
+ } else {
+ /* While size is stuck at 0, and, later, if the */
+ /* caller doesn't use an exactly sized recipient */
+ /* for the data, default to Linux compatibility */
+ optval = OPTBIT(INP_RECVPKTINFO);
+ error = sockopt_setint(sopt, optval);
+ }
+ break;
+
#if 0 /* defined(IPSEC) */
case IP_IPSEC_POLICY:
{
@@ -1416,11 +1466,14 @@
struct inpcb *inp, kauth_cred_t cred)
{
struct cmsghdr *cm;
- struct in_pktinfo *pktinfo;
+ struct in_pktinfo pktinfo;
int error;

pktopts->ippo_imo = inp->inp_moptions;
- sockaddr_in_init(&pktopts->ippo_laddr, &inp->inp_laddr, 0);
+ if (!in_nullhost(inp->inp_prefsrcip))
+ sockaddr_in_init(&pktopts->ippo_laddr, &inp->inp_prefsrcip, 0);
+ else
+ sockaddr_in_init(&pktopts->ippo_laddr, &inp->inp_laddr, 0);

if (control == NULL)
return 0;
@@ -1446,13 +1499,22 @@

switch (cm->cmsg_type) {
case IP_PKTINFO:
- if (cm->cmsg_len != CMSG_LEN(sizeof(struct in_pktinfo)))
+ if (cm->cmsg_len != CMSG_LEN(sizeof(pktinfo)))
return EINVAL;
-
- pktinfo = (struct in_pktinfo *)CMSG_DATA(cm);
- error = ip_pktinfo_prepare(pktinfo, pktopts, flags,
+ memcpy(&pktinfo, CMSG_DATA(cm), sizeof(pktinfo));
+ error = ip_pktinfo_prepare(&pktinfo, pktopts, flags,
cred);
- if (error != 0)
+ if (error)
+ return error;
+ break;
+ case IP_SENDSRCADDR: /* FreeBSD compatibility */
+ if (cm->cmsg_len != CMSG_LEN(sizeof(struct in_addr)))
+ return EINVAL;
+ pktinfo.ipi_ifindex = 0;
+ pktinfo.ipi_addr = ((struct in_pktinfo *)CMSG_DATA(cm))->ipi_addr;
+ error = ip_pktinfo_prepare(&pktinfo, pktopts, flags,
+ cred);
+ if (error)
return error;
break;
default:
Index: share/man/man4/ip.4
===================================================================
RCS file: /cvsroot/src/share/man/man4/ip.4,v
retrieving revision 1.40
diff -u -u -r1.40 ip.4
--- share/man/man4/ip.4 13 Aug 2017 18:19:44 -0000 1.40
+++ share/man/man4/ip.4 31 Dec 2017 19:06:49 -0000
@@ -96,8 +96,8 @@
.Ed
.Pp
The
-.Dv IP_PKTINFO
-option can be used to turn on receiving of information about the source
+.Dv IP_RECVPKTINFO
+option can be used to turn on receiving of information about the destination
address of the packet, and the interface index.
The information is passed in a
.Vt struct in_pktinfo
@@ -117,13 +117,24 @@
.Pp
For
.Xr sendmsg 2 ,
-the source address or output interface can be specified by adding
+the source address or output interface can be specified by adding an
.Dv IP_PKTINFO
-to the control part of the message on a
+message to the control part of the message on a
.Dv SOCK_DGRAM
or
.Dv SOCK_RAW
-socket.
+socket. Setting ipi_ifindex will cause the primary address of that
+interface to be used; setting ipi_addr will directly choose that address.
+The IP_PKTINFO cmsghdr structure from a received message may be used
+unchanged, in which case the outgoing message will be sent from the
+address the incoming message was received on.
+.Pp
+Setting the
+.Dv IP_PKTINFO
+option on a socket, with the same
+.Vt struct in_pktinfo
+structure, will set the default source address to be used until set
+again, unless explicitly overridden on a per-packet basis, as above.
.Pp
The
.Dv IP_PORTALGO
@@ -177,6 +188,18 @@
cmsg_type = IP_RECVDSTADDR
.Ed
.Pp
+For
+.Xr sendmsg 2 ,
+the source address can be specified by adding
+.Dv IP_SENDSRCADDR
+to the control part of the message on a
+.Dv SOCK_DGRAM
+or
+.Dv SOCK_RAW
+socket. The IP_RECVDSTADDR cmsghdr structure from a received message
+may be used unchanged, in which case the outgoing message will be sent
+from the address the incoming message was received on.
+.Pp
If the
.Dv IP_RECVIF
option is enabled on a
@@ -197,12 +220,6 @@
cmsg_type = IP_RECVIF
.Ed
.Pp
-The
-.Dv IP_RECVPKTINFO
-option is similar to the
-.Dv IP_PKTINFO
-one, only in this case the inbound information is returned.
-.Pp
If the
.Dv IP_RECVTTL
option is enabled on a
@@ -452,6 +469,24 @@
the IP option field was improperly formed; an option field was
shorter than the minimum value or longer than the option buffer provided.
.El
+.Sh COMPATIBILITY
+The
+.Dv IP_RECVPKTINFO
+option is used because it is directly compatible with Solaris, AIX, etc.,
+and the
+.Dv IP_PKTINFO
+option is intended to be used in their manner, to set the default source
+address for outgoing packets on a
+.Dv SOCK_DGRAM
+or
+.Dv SOCK_RAW
+socket. For compatibility with Linux, however, if you attempt to set the
+.Dv IP_PKTINFO
+option, using an integer parameter as a boolean value, this will
+transparently manipulate the
+.Dv IP_RECVPKTINFO
+option instead. Source code compatbility with both environments is thus
+maintained.
.Sh SEE ALSO
.Xr getsockopt 2 ,
.Xr recv 2 ,
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2018-01-01 08:40:37 UTC
Permalink
Post by Christos Zoulas
This came up in a different discussion; we should pass the size around...
Thanks for fixing that -- and for committings my IP_PKTINFO changes! I
liked the little adjustments you made to the code, there. It does get
more readable with the reduced nesting and early resolution tweaks.

Where can I find official guidelines for coding style?

When we're sure the passing of the buffer size down from getsockopt(2)
is good, we should adjust the bit that handles getting the IP_PKTINFO
option data: the comments are no longer correct, at least.

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
John Nemeth
2018-01-01 09:37:45 UTC
Permalink
On Jan 1, 9:40am, Tom Ivar Helbekkmo wrote:
} Christos Zoulas <***@astron.com> writes:
}
} > This came up in a different discussion; we should pass the size around...
}
} Thanks for fixing that -- and for committings my IP_PKTINFO changes! I
} liked the little adjustments you made to the code, there. It does get
} more readable with the reduced nesting and early resolution tweaks.
}
} Where can I find official guidelines for coding style?

Take a look at /usr/share/misc/style.

}-- End of excerpt from Tom Ivar Helbekkmo

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2018-01-04 12:12:11 UTC
Permalink
Now that this is in place, I'm preparing a pull request for PowerDNS, to
get it to compile out of the box on NetBSD. Among other things, I need
to differentiate between NetBSD versions before and after my changes, to
keep older versions from trying to use the IP_PKTINFO socket option.

Since the first version with everything correctly in place is 8.99.11,
I'm doing this in the appropriate header file:

#if defined(__NetBSD_Version__) && __NetBSD_Version__ < 899001100 && defined(IP_PKTINFO)
#undef IP_PKTINFO
#endif

The idea is that it'll be used for our ongoing -current starting with
8.99.11, and that our first official version to have the new code will
be NetBSD 9.

If my thinking is wrong, someone please let me know! :)

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2018-01-04 13:10:23 UTC
Permalink
Date: Thu, 04 Jan 2018 13:12:11 +0100
From: Tom Ivar Helbekkmo <***@hamartun.priv.no>
Message-ID: <***@thuvia.hamartun.priv.no>

| Since the first version with everything correctly in place is 8.99.11,
| I'm doing this in the appropriate header file:
|
| #if defined(__NetBSD_Version__) && __NetBSD_Version__ < 899001100 && defined(IP_PKTINFO)
| #undef IP_PKTINFO
| #endif
|
| The idea is that it'll be used for our ongoing -current starting with
| 8.99.11, and that our first official version to have the new code will
| be NetBSD 9.

That kind of thing is possible, but unreliable, and what's more, it
is not impossible (and perhaps reasonable, once proven complete and
effective) for the changes to be pulled up to NetBSD-8 (either before
it is released or for 8.1 later or something.)

Must better would be to find some #define that was changed, or added, or
removed, if necessary by making a new one, which indicates that the
change has been made, and then test that instead of __NetBSD_Version__.

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2018-01-04 13:23:54 UTC
Permalink
Post by Robert Elz
That kind of thing is possible, but unreliable, and what's more, it
is not impossible (and perhaps reasonable, once proven complete and
effective) for the changes to be pulled up to NetBSD-8 (either before
it is released or for 8.1 later or something.)
Must better would be to find some #define that was changed, or added, or
removed, if necessary by making a new one, which indicates that the
change has been made, and then test that instead of __NetBSD_Version__.
Aha. Well, I did introduce the IP_SENDSRCADDR at the same time, for
FreeBSD compatibility. It was added in the same commit, so it should
tag along if this set of changes is pulled up. I could say that if
that's defined, then the IP_PKTINFO stuff is fresh enough to be used.

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2018-01-04 13:36:51 UTC
Permalink
Date: Thu, 04 Jan 2018 14:23:54 +0100
From: Tom Ivar Helbekkmo <***@hamartun.priv.no>
Message-ID: <***@thuvia.hamartun.priv.no>

| I did introduce the IP_SENDSRCADDR at the same time,

Yes, that is a much better solution (we can ignore systems built
during the few days when the new code was stabilizing - they can,
and should, just be updated). Just add a comment with the #ifdef
explaining what is happening.

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2018-01-04 13:46:23 UTC
Permalink
Post by Robert Elz
Yes, that is a much better solution (we can ignore systems built
during the few days when the new code was stabilizing - they can,
and should, just be updated). Just add a comment with the #ifdef
explaining what is happening.
Something like this, then:

#if defined(__NetBSD__) && defined(IP_PKTINFO) && !defined(IP_SENDSRCADDR)
// The IP_PKTINFO support in NetBSD was incompatible with Linux until a
// change that also introduced IP_SENDSRCADDR for FreeBSD compatibility.
#undef IP_PKTINFO
#endif

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2018-01-04 14:42:11 UTC
Permalink
Date: Thu, 04 Jan 2018 14:46:23 +0100
From: Tom Ivar Helbekkmo <***@hamartun.priv.no>
Message-ID: <***@thuvia.hamartun.priv.no>

| Something like this, then:

Looks OK to me, (though I prefer traditional C /* */ comments).

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2018-01-04 16:41:44 UTC
Permalink
Post by Robert Elz
Looks OK to me, (though I prefer traditional C /* */ comments).
Yeah, me too, but in this case it's C++ source code, and all the other
comments in the file use that notation, so I think I'd better comply.

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...