COMPAT_50 vs NET_RT_IFLIST

Discussion:

COMPAT_50 vs NET_RT_IFLIST

(too old to reply)

Mouse

2019-04-25 15:08:26 UTC

I have, at work, a program, designed and written under 5.2, that uses
sysctl on <CTL_NET, PF_ROUTE, 0, 0, NET_RT_IFLIST, 0> to get a list of
network interfaces, which it then walks looking for things. (It's part
of a turnkey system.)

I just now tried running this, with the associated 5.2 userland
fragments, under an 8.0 kernel (I just replaced the 5.2 kernel with an
8.0 kernel on the turnkey system, the idea being to get 8.0's hardware
support). It fails, complaining about RTM_IFINFO blobs with RTA_IFP
set overrunning available space and about blob version numbers being 4
rather than 3.

Is this expected? Does COMPAT_50 not extend to this operation? Or am
I doing something wrong?

I can show exact code if it would help, but before throwing that at
people I thought I'd first ask if it's even supposed to work.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Roy Marples

2019-04-25 16:23:10 UTC

Post by Mouse
I have, at work, a program, designed and written under 5.2, that uses
sysctl on <CTL_NET, PF_ROUTE, 0, 0, NET_RT_IFLIST, 0> to get a list of
network interfaces, which it then walks looking for things. (It's part
of a turnkey system.)
I just now tried running this, with the associated 5.2 userland
fragments, under an 8.0 kernel (I just replaced the 5.2 kernel with an
8.0 kernel on the turnkey system, the idea being to get 8.0's hardware
support). It fails, complaining about RTM_IFINFO blobs with RTA_IFP
set overrunning available space and about blob version numbers being 4
rather than 3.
Is this expected? Does COMPAT_50 not extend to this operation? Or am
I doing something wrong?
I can show exact code if it would help, but before throwing that at
people I thought I'd first ask if it's even supposed to work.

Sounds like you didn't compare rtm_version with RTM_VERSION.
See here in libc:
https://nxr.netbsd.org/xref/src/lib/libc/net/getifaddrs.c#106

Roy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2019-04-26 12:36:16 UTC

Post by Stephen Borrill

So I created a new directory, and unpacked the 5.2 base.tgz
distribution set in it.

You're going to need etc.tgz in there too.

What for?

I just went to one of my 5.2 systems and did ktrace ifconfig -l and
then kdump | egrep NAMI. Only two references to /etc showed up,
neither of which exists on my system (/etc/ld.so.conf and
/etc/malloc.conf). So I suspect etc.tgz is not actually needed for
this test.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2019-04-25 17:03:26 UTC

Post by Roy Marples
Sounds like you didn't compare rtm_version with RTM_VERSION.

I (think I) am; that's what's producing the "version is 4, not 3"
complaints. It's built under 5.2, with RTM_VERSION set to 3, and
running under 8.0, whose RTM_VERSION is 4.

Post by Roy Marples
https://nxr.netbsd.org/xref/src/lib/libc/net/getifaddrs.c#106

Can't; that refuses to serve content over HTTP (I neither have nor want
HTTPS support, possibly excepting somewhere I don't know how to use on
the 8.0 work machine). But I did look at 8.0's
/usr/src/lib/libc/net/getifaddrs.c, and line 106 compares rtm_version
against RTM_VERSION, the next line silently(!) ignoring the blob if
they differ. My code does likewise, except that it's noisier.

The version 3 blobs I appear to be getting - the other complaints come
from code that's not run if the version is wrong - are presumably
COMPAT_50 in action, but I'm seeing complaints that seem to me to imply
that they're not actually compatible with what 5.2 produces.

I'd need to dig more to be certain, since under 5.2 those complaints,
if they appear, get cleared from the screen far too fast for me to tell
whether they're present - but I'm reasonably sure I wouldn't've left
that code in there if it tripped under 5.2. (The screen-clear time
difference appears to be X-related, nothing relevant to this list.)
But I wanted to find out whether it's supposed to work before haring
off after something that might not be there at all.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-25 20:32:09 UTC

Post by Roy Marples

Post by Mouse
I have, at work, a program, designed and written under 5.2, that uses
sysctl on <CTL_NET, PF_ROUTE, 0, 0, NET_RT_IFLIST, 0> to get a list of
network interfaces, which it then walks looking for things. (It's part
of a turnkey system.)
I just now tried running this, with the associated 5.2 userland
fragments, under an 8.0 kernel (I just replaced the 5.2 kernel with an
8.0 kernel on the turnkey system, the idea being to get 8.0's hardware
support). It fails, complaining about RTM_IFINFO blobs with RTA_IFP
set overrunning available space and about blob version numbers being 4
rather than 3.
Is this expected? Does COMPAT_50 not extend to this operation? Or am
I doing something wrong?
I can show exact code if it would help, but before throwing that at
people I thought I'd first ask if it's even supposed to work.

Sounds like you didn't compare rtm_version with RTM_VERSION.
https://nxr.netbsd.org/xref/src/lib/libc/net/getifaddrs.c#106

Hmmm, I would've expected it to work.

Since you're running on an 8.0 kernel, it's definitely not something
that I broke during the [pgoyette_compat] branch work, but whatever
might need fixing will likely need to be adapted to the new order.

Let me know if there's any way I can help.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2019-04-25 20:44:27 UTC

Post by Paul Goyette

I have, at work, a program, designed and written under 5.2 [...].
I just now tried running this, with the associated 5.2 userland
fragments, under an 8.0 kernel [...].

Let me know if there's any way I can help.

About all I can think of at the moment would be to try 5.2 userland
with an 8.0 kernel, with a particular focus on network interfaces
(things like ifconfig -l, ifconfig -au, etc). Either replace a 5.2
system's kernel with an 8.0 kernel or run a 5.2 userland chrooted on an
8.0 system, those are how I'd try it.

I don't _think_ anything I've done to the system is relevant here, but
it would be nice to have that confirmed-or-refuted before I (or anyone
else for that matter) starts chasing after it. The turnkey system in
question does require some changes in order to run, and it's possible,
albeit unlikely, that something else slipped in.

I'm going to try to find the time to pull over completely stock 5.2 and
try that, but (a) it may take some time for me to scrape up the round
tuits for that and (b) relatively independent attempts to reproduce it
would be nice in any case.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-26 08:09:24 UTC

Post by Mouse

Post by Paul Goyette

I have, at work, a program, designed and written under 5.2 [...].
I just now tried running this, with the associated 5.2 userland
fragments, under an 8.0 kernel [...].

Let me know if there's any way I can help.

About all I can think of at the moment would be to try 5.2 userland
with an 8.0 kernel, with a particular focus on network interfaces
(things like ifconfig -l, ifconfig -au, etc). Either replace a 5.2
system's kernel with an 8.0 kernel or run a 5.2 userland chrooted on an
8.0 system, those are how I'd try it.
I don't _think_ anything I've done to the system is relevant here, but
it would be nice to have that confirmed-or-refuted before I (or anyone
else for that matter) starts chasing after it. The turnkey system in
question does require some changes in order to run, and it's possible,
albeit unlikely, that something else slipped in.
I'm going to try to find the time to pull over completely stock 5.2 and
try that, but (a) it may take some time for me to scrape up the round
tuits for that and (b) relatively independent attempts to reproduce it
would be nice in any case.

Well, I'm not super-familiar with setting up test environments using
chroot, but I was unable to get a qemu install of 5.2 (seems that
recent qemu doesn't deal well with piixide driver). So I created a
new directory, and unpacked the 5.2 base.tgz distribution set in it.

I then did a chroot to the new directory, and tried to run ifconfig.
The results are not what I expected. First, I verified that I had
all the right pieces inside the chroot directory - everything looks
good - at least its all from the right year!

# ls -l /sbin/ifconfig
-r-xr-xr-x 1 0 20 114557 Nov 28 2012 /sbin/ifconfig
# ldd /sbin/ifconfig
/sbin/ifconfig:
-lutil.7 => /lib/libutil.so.7
-lc.12 => /lib/libc.so.12
-lprop.0 => /lib/libprop.so.0
# ls -l /lib/libutil.so*
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so ->
libutil.so.7.15
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so.7 ->
libutil.so.7.15
-r--r--r-- 1 0 20 96585 Nov 28 2012 /lib/libutil.so.7.15
# ls -l /lib/libc.so*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so -> libc.so.12.164
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so.12 ->
libc.so.12.164
-r--r--r-- 1 0 20 1317667 Nov 28 2012 /lib/libc.so.12.164
# ls -l /lib/libprop*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so ->
libprop.so.0.7
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so.0 ->
libprop.so.0.7
-r--r--r-- 1 0 20 82138 Nov 28 2012 /lib/libprop.so.0.7

But, ifconfig fails to run:

# ifconfig -l
ifconfig: getifaddrs: No such file or directory
# ifconfig -a
ifconfig: getifaddrs: No such file or directory

FWIW, this is running on a -current host (8.99.37, from yesterday), and
with the compat_50 (and above) modules loaded.

# modstat | grep compat
compat_50 exec filesys - 0 - compat_60
compat_60 exec filesys a 1 - compat_70
compat_70 exec filesys a 1 - compat_80
compat_80 exec filesys a 1 - -
compat_util misc builtin - 0 - -
# uname -a
NetBSD speedy.whooppee.com 8.99.37 NetBSD 8.99.37 (SPEEDY 2019-04-24 23:45:06 UTC) #0: Thu Apr 25 09:00:09 UTC 2019 ***@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-26 08:24:40 UTC

Post by Paul Goyette

Post by Mouse

Post by Paul Goyette

I have, at work, a program, designed and written under 5.2 [...].
I just now tried running this, with the associated 5.2 userland
fragments, under an 8.0 kernel [...].

Let me know if there's any way I can help.

About all I can think of at the moment would be to try 5.2 userland
with an 8.0 kernel, with a particular focus on network interfaces
(things like ifconfig -l, ifconfig -au, etc). Either replace a 5.2
system's kernel with an 8.0 kernel or run a 5.2 userland chrooted on an
8.0 system, those are how I'd try it.
I don't _think_ anything I've done to the system is relevant here, but
it would be nice to have that confirmed-or-refuted before I (or anyone
else for that matter) starts chasing after it. The turnkey system in
question does require some changes in order to run, and it's possible,
albeit unlikely, that something else slipped in.
I'm going to try to find the time to pull over completely stock 5.2 and
try that, but (a) it may take some time for me to scrape up the round
tuits for that and (b) relatively independent attempts to reproduce it
would be nice in any case.

Well, I'm not super-familiar with setting up test environments using
chroot, but I was unable to get a qemu install of 5.2 (seems that
recent qemu doesn't deal well with piixide driver). So I created a
new directory, and unpacked the 5.2 base.tgz distribution set in it.
I then did a chroot to the new directory, and tried to run ifconfig.
The results are not what I expected. First, I verified that I had
all the right pieces inside the chroot directory - everything looks
good - at least its all from the right year!
# ls -l /sbin/ifconfig
-r-xr-xr-x 1 0 20 114557 Nov 28 2012 /sbin/ifconfig
# ldd /sbin/ifconfig
-lutil.7 => /lib/libutil.so.7
-lc.12 => /lib/libc.so.12
-lprop.0 => /lib/libprop.so.0
# ls -l /lib/libutil.so*
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so -> libutil.so.7.15
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so.7 -> libutil.so.7.15
-r--r--r-- 1 0 20 96585 Nov 28 2012 /lib/libutil.so.7.15
# ls -l /lib/libc.so*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so -> libc.so.12.164
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so.12 -> libc.so.12.164
-r--r--r-- 1 0 20 1317667 Nov 28 2012 /lib/libc.so.12.164
# ls -l /lib/libprop*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so -> libprop.so.0.7
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so.0 -> libprop.so.0.7
-r--r--r-- 1 0 20 82138 Nov 28 2012 /lib/libprop.so.0.7
# ifconfig -l
ifconfig: getifaddrs: No such file or directory
# ifconfig -a
ifconfig: getifaddrs: No such file or directory
FWIW, this is running on a -current host (8.99.37, from yesterday), and
with the compat_50 (and above) modules loaded.
# modstat | grep compat
compat_50 exec filesys - 0 - compat_60
compat_60 exec filesys a 1 - compat_70
compat_70 exec filesys a 1 - compat_80
compat_80 exec filesys a 1 - -
compat_util misc builtin - 0 - -
# uname -a
NetBSD speedy.whooppee.com 8.99.37 NetBSD 8.99.37 (SPEEDY 2019-04-24 23:45:06
UTC) #0: Thu Apr 25 09:00:09 UTC 2019
amd64

Digging just a little bit deeper with ktrace/kdump, I find

7806 1 ifconfig CALL __sysctl(0x7f7fff5d66d0,6,0,0x7f7fff5d66e8,0,0)
7806 1 ifconfig RET __sysctl -1 errno 2 No such file or directory
7806 1 ifconfig CALL write(2,0x7f7fff5d5ce0,0xa)
7806 1 ifconfig GIO fd 2 wrote 10 bytes
"ifconfig: "
7806 1 ifconfig RET write 10/0xa
7806 1 ifconfig CALL write(2,0x7f7fff5d5dc0,0xa)
7806 1 ifconfig GIO fd 2 wrote 10 bytes
"getifaddrs"
7806 1 ifconfig RET write 10/0xa
7806 1 ifconfig CALL write(2,0x774133de753f,2)
7806 1 ifconfig GIO fd 2 wrote 2 bytes
": "
7806 1 ifconfig RET write 2
7806 1 ifconfig CALL issetugid
7806 1 ifconfig RET issetugid 0
7806 1 ifconfig CALL issetugid
7806 1 ifconfig RET issetugid 0
7806 1 ifconfig CALL open(0x7f7fff5d5900,0,0)
7806 1 ifconfig NAMI "/usr/share/nls/nls.alias.db"
7806 1 ifconfig RET open -1 errno 2 No such file or directory

So looks to me like the sysctl() call is failing.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2019-04-26 12:30:20 UTC

Post by Paul Goyette

So I created a new directory, and unpacked the 5.2 base.tgz
distribution set in it.
I then did a chroot to the new directory, and tried to run ifconfig.
The results are not what I expected.
# ifconfig -l
ifconfig: getifaddrs: No such file or directory
# ifconfig -a
ifconfig: getifaddrs: No such file or directory

Digging just a little bit deeper with ktrace/kdump, I find
7806 1 ifconfig CALL __sysctl(0x7f7fff5d66d0,6,0,0x7f7fff5d66e8,0,0)
7806 1 ifconfig RET __sysctl -1 errno 2 No such file or directory
7806 1 ifconfig CALL write(2,0x7f7fff5d5ce0,0xa)
7806 1 ifconfig GIO fd 2 wrote 10 bytes
"ifconfig: "
So looks to me like the sysctl() call is failing.

That's what it looks like to me too.

Curious. That's not the failure mode I'm seeing; I'm seeing sysctl
succeed but with corrupt (at least from the written-for-5.2 code's
point of view) data returned.

Thank you. This is exactly the sort of surprise the potential for
which led me to ask if someone else could try something similar.

I clearly need to dig deeper.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-27 07:23:39 UTC

Post by Paul Goyette

Post by Paul Goyette
Well, I'm not super-familiar with setting up test environments using
chroot, but I was unable to get a qemu install of 5.2 (seems that
recent qemu doesn't deal well with piixide driver). So I created a
new directory, and unpacked the 5.2 base.tgz distribution set in it.
I then did a chroot to the new directory, and tried to run ifconfig.
The results are not what I expected. First, I verified that I had
all the right pieces inside the chroot directory - everything looks
good - at least its all from the right year!
# ls -l /sbin/ifconfig
-r-xr-xr-x 1 0 20 114557 Nov 28 2012 /sbin/ifconfig
# ldd /sbin/ifconfig
-lutil.7 => /lib/libutil.so.7
-lc.12 => /lib/libc.so.12
-lprop.0 => /lib/libprop.so.0
# ls -l /lib/libutil.so*
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so -> libutil.so.7.15
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so.7 ->
libutil.so.7.15
-r--r--r-- 1 0 20 96585 Nov 28 2012 /lib/libutil.so.7.15
# ls -l /lib/libc.so*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so -> libc.so.12.164
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so.12 -> libc.so.12.164
-r--r--r-- 1 0 20 1317667 Nov 28 2012 /lib/libc.so.12.164
# ls -l /lib/libprop*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so -> libprop.so.0.7
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so.0 -> libprop.so.0.7
-r--r--r-- 1 0 20 82138 Nov 28 2012 /lib/libprop.so.0.7
# ifconfig -l
ifconfig: getifaddrs: No such file or directory
# ifconfig -a
ifconfig: getifaddrs: No such file or directory
FWIW, this is running on a -current host (8.99.37, from yesterday), and
with the compat_50 (and above) modules loaded.
# modstat | grep compat
compat_50 exec filesys - 0 - compat_60
compat_60 exec filesys a 1 - compat_70
compat_70 exec filesys a 1 - compat_80
compat_80 exec filesys a 1 - -
compat_util misc builtin - 0 - -
# uname -a
NetBSD speedy.whooppee.com 8.99.37 NetBSD 8.99.37 (SPEEDY 2019-04-24
23:45:06 UTC) #0: Thu Apr 25 09:00:09 UTC 2019
amd64

Digging just a little bit deeper with ktrace/kdump, I find
7806 1 ifconfig CALL __sysctl(0x7f7fff5d66d0,6,0,0x7f7fff5d66e8,0,0)
7806 1 ifconfig RET __sysctl -1 errno 2 No such file or directory
7806 1 ifconfig CALL write(2,0x7f7fff5d5ce0,0xa)
7806 1 ifconfig GIO fd 2 wrote 10 bytes
"ifconfig: "
7806 1 ifconfig RET write 10/0xa
7806 1 ifconfig CALL write(2,0x7f7fff5d5dc0,0xa)
7806 1 ifconfig GIO fd 2 wrote 10 bytes
"getifaddrs"
7806 1 ifconfig RET write 10/0xa
7806 1 ifconfig CALL write(2,0x774133de753f,2)
7806 1 ifconfig GIO fd 2 wrote 2 bytes
": "
7806 1 ifconfig RET write 2
7806 1 ifconfig CALL issetugid
7806 1 ifconfig RET issetugid 0
7806 1 ifconfig CALL issetugid
7806 1 ifconfig RET issetugid 0
7806 1 ifconfig CALL open(0x7f7fff5d5900,0,0)
7806 1 ifconfig NAMI "/usr/share/nls/nls.alias.db"
7806 1 ifconfig RET open -1 errno 2 No such file or directory
So looks to me like the sysctl() call is failing.

So, I decided to try a little bit harder. I installed a stock 8.0
system in a qemu vm, and then loaded 5.2's base.tgz into a chroot
directory.

Using the stock 8.0 ifconfig, I see
# ifconfig -l
wm0 lo0
# ifconfig -a
wm0: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
capabilities=2bf80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>

capabilities=2bf80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Tx>
capabilities=2bf80<UDP6CSUM_Tx>
enabled=0
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 52:54:00:12:34:56
media: Ethernet autoselect (none)
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2

When I chroot into the 5.2 directory, I get no output of any sort:

# chroot /chroot-52/
# ifconfig -l

# ifconfig -a

So, there's definitely something wrong in 8.0 with the compat sysctl.

I probably changed the behavior with my work on the compat branch, which
is likely why -current gets ENOENT errors.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-29 02:47:24 UTC

So, I decided to try a little bit harder. I installed a stock 8.0 system in
a qemu vm, and then loaded 5.2's base.tgz into a chroot directory.
Using the stock 8.0 ifconfig, I see
# ifconfig -l
wm0 lo0
# ifconfig -a
wm0: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
capabilities=2bf80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=2bf80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Tx>
capabilities=2bf80<UDP6CSUM_Tx>
enabled=0
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 52:54:00:12:34:56
media: Ethernet autoselect (none)
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
# chroot /chroot-52/
# ifconfig -l
# ifconfig -a
So, there's definitely something wrong in 8.0 with the compat sysctl.
I probably changed the behavior with my work on the compat branch, which is
likely why -current gets ENOENT errors.

OK, I dug a little bit further, and I know why there is a difference
between 8.0 and -current. Yep, I broke it on the [pgoyette-compat]
branch. I will see if I can fix it. (Please note that real-life is
getting very complex right now, and I don't know how soon I will be
able to work on this.)

(Short version is that I removed the code that created multiple
versions of the sysctl nodes.)

I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-29 05:44:09 UTC

I have removed the differences between 8.0 and -current but the
baseline problem still exists. Someone more familiar with this
code will have to look into what broke between 5.2 and 8.0

Post by Paul Goyette

So, I decided to try a little bit harder. I installed a stock 8.0 system
in a qemu vm, and then loaded 5.2's base.tgz into a chroot directory.
Using the stock 8.0 ifconfig, I see
# ifconfig -l
wm0 lo0
# ifconfig -a
wm0: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
capabilities=2bf80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=2bf80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Tx>
capabilities=2bf80<UDP6CSUM_Tx>
enabled=0
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 52:54:00:12:34:56
media: Ethernet autoselect (none)
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
# chroot /chroot-52/
# ifconfig -l
# ifconfig -a
So, there's definitely something wrong in 8.0 with the compat sysctl.
I probably changed the behavior with my work on the compat branch, which is
likely why -current gets ENOENT errors.

OK, I dug a little bit further, and I know why there is a difference
between 8.0 and -current. Yep, I broke it on the [pgoyette-compat]
branch. I will see if I can fix it. (Please note that real-life is
getting very complex right now, and I don't know how soon I will be
able to work on this.)
(Short version is that I removed the code that created multiple
versions of the sysctl nodes.)
I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+
!DSPAM:5cc665d0251011450219740!

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Stephen Borrill

2019-04-26 11:49:57 UTC

Post by Paul Goyette

Post by Mouse

Post by Paul Goyette

I have, at work, a program, designed and written under 5.2 [...].
I just now tried running this, with the associated 5.2 userland
fragments, under an 8.0 kernel [...].

Let me know if there's any way I can help.

About all I can think of at the moment would be to try 5.2 userland
with an 8.0 kernel, with a particular focus on network interfaces
(things like ifconfig -l, ifconfig -au, etc). Either replace a 5.2
system's kernel with an 8.0 kernel or run a 5.2 userland chrooted on an
8.0 system, those are how I'd try it.
I don't _think_ anything I've done to the system is relevant here, but
it would be nice to have that confirmed-or-refuted before I (or anyone
else for that matter) starts chasing after it. The turnkey system in
question does require some changes in order to run, and it's possible,
albeit unlikely, that something else slipped in.
I'm going to try to find the time to pull over completely stock 5.2 and
try that, but (a) it may take some time for me to scrape up the round
tuits for that and (b) relatively independent attempts to reproduce it
would be nice in any case.

Well, I'm not super-familiar with setting up test environments using
chroot, but I was unable to get a qemu install of 5.2 (seems that
recent qemu doesn't deal well with piixide driver). So I created a
new directory, and unpacked the 5.2 base.tgz distribution set in it.

You're going to need etc.tgz in there too.

Post by Paul Goyette
I then did a chroot to the new directory, and tried to run ifconfig.
The results are not what I expected. First, I verified that I had
all the right pieces inside the chroot directory - everything looks
good - at least its all from the right year!
# ls -l /sbin/ifconfig
-r-xr-xr-x 1 0 20 114557 Nov 28 2012 /sbin/ifconfig
# ldd /sbin/ifconfig
-lutil.7 => /lib/libutil.so.7
-lc.12 => /lib/libc.so.12
-lprop.0 => /lib/libprop.so.0
# ls -l /lib/libutil.so*
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so -> libutil.so.7.15
lrwxrwxr-x 1 0 20 15 Nov 28 2012 /lib/libutil.so.7 -> libutil.so.7.15
-r--r--r-- 1 0 20 96585 Nov 28 2012 /lib/libutil.so.7.15
# ls -l /lib/libc.so*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so -> libc.so.12.164
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libc.so.12 -> libc.so.12.164
-r--r--r-- 1 0 20 1317667 Nov 28 2012 /lib/libc.so.12.164
# ls -l /lib/libprop*
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so -> libprop.so.0.7
lrwxrwxr-x 1 0 20 14 Nov 28 2012 /lib/libprop.so.0 -> libprop.so.0.7
-r--r--r-- 1 0 20 82138 Nov 28 2012 /lib/libprop.so.0.7
# ifconfig -l
ifconfig: getifaddrs: No such file or directory
# ifconfig -a
ifconfig: getifaddrs: No such file or directory
FWIW, this is running on a -current host (8.99.37, from yesterday), and
with the compat_50 (and above) modules loaded.
# modstat | grep compat
compat_50 exec filesys - 0 - compat_60
compat_60 exec filesys a 1 - compat_70
compat_70 exec filesys a 1 - compat_80
compat_80 exec filesys a 1 - -
compat_util misc builtin - 0 - -
# uname -a
NetBSD speedy.whooppee.com 8.99.37 NetBSD 8.99.37 (SPEEDY 2019-04-24 23:45:06
UTC) #0: Thu Apr 25 09:00:09 UTC 2019
amd64
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

matthew green

2019-04-29 07:10:59 UTC

Post by Paul Goyette
I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

i think i have a guess about the problem.

sys/net/if.h, sys/net/route.h, and sys/compat/net/if.h all
have this code:

/*
* Message format for use in obtaining information about interfaces from
* sysctl and the routing socket. We need to force 64-bit alignment if we
* aren't using compatiblity definitons.
*/
#if !defined(_KERNEL) || !defined(COMPAT_RTSOCK)
#define __align64 __aligned(sizeof(uint64_t))
#else
#define __align64
#endif
struct if_msghdr {
u_short ifm_msglen __align64;

but i think this comment is wrong.

the compat structures are defined in the compat headers and
the above structure should never change, however when the
code handling code wants to talk to the *real* structure it
will get this adjusted one (without the align), and thus
it will copy the wrong portions out from it.

the fix may be as simple as removing this from these headers
(leaving it always defined for the current defs), and making
sure that the compat headers have the right alignment (my
quick look seem ok.)

this will, obviously, need a recompile of the newer kernel.

.mrg.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-29 11:35:22 UTC

Alas, making the suggested changes does not help. Same results as
before:

Userland and Kernel both -current with suggested changes (the diffs
are attached to this Email):

# ifconfig -l
wm0 lo0
# ifconfig lo0
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
#

And with a 5.2 base system loaded in /chroot52 directory:

# chroot /chroot52 ifconfig -l

# chroot /chroot52 ifconfig lo0

#

Post by matthew green

Post by Paul Goyette
I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

i think i have a guess about the problem.
sys/net/if.h, sys/net/route.h, and sys/compat/net/if.h all
/*
* Message format for use in obtaining information about interfaces from
* sysctl and the routing socket. We need to force 64-bit alignment if we
* aren't using compatiblity definitons.
*/
#if !defined(_KERNEL) || !defined(COMPAT_RTSOCK)
#define __align64 __aligned(sizeof(uint64_t))
#else
#define __align64
#endif
struct if_msghdr {
u_short ifm_msglen __align64;
but i think this comment is wrong.
the compat structures are defined in the compat headers and
the above structure should never change, however when the
code handling code wants to talk to the *real* structure it
will get this adjusted one (without the align), and thus
it will copy the wrong portions out from it.
the fix may be as simple as removing this from these headers
(leaving it always defined for the current defs), and making
sure that the compat headers have the right alignment (my
quick look seem ok.)
this will, obviously, need a recompile of the newer kernel.
.mrg.
!DSPAM:5cc6a3b33697082216442!

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

Paul Goyette

2019-04-29 22:48:51 UTC

Some additional testing (on a -current base system) shows that the
problem is almost certainly related to compat_50 code. Using the
ifconfig from 6.0 or newer does not display the problem.

Also, the issue is probably wider than just the sysctl stuff, since
running a 5.2 version of ``route monitor'' produces no output when
adding or changing an addresss on lo0; the 6.0 version of route
monitor produces correct output.

Furthermore, previous testing show that the problem also occurs on
a 8.0 base system with 5.2 userland. (I have not tested a 7.0 base
system.)

Post by Paul Goyette
Alas, making the suggested changes does not help. Same results as
Userland and Kernel both -current with suggested changes (the diffs
# ifconfig -l
wm0 lo0
# ifconfig lo0
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
#
# chroot /chroot52 ifconfig -l
# chroot /chroot52 ifconfig lo0
#

Post by matthew green

Post by Paul Goyette
I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

i think i have a guess about the problem.
sys/net/if.h, sys/net/route.h, and sys/compat/net/if.h all
/*
* Message format for use in obtaining information about interfaces from
* sysctl and the routing socket. We need to force 64-bit alignment if we
* aren't using compatiblity definitons.
*/
#if !defined(_KERNEL) || !defined(COMPAT_RTSOCK)
#define __align64 __aligned(sizeof(uint64_t))
#else
#define __align64
#endif
struct if_msghdr {
u_short ifm_msglen __align64;
but i think this comment is wrong.
the compat structures are defined in the compat headers and
the above structure should never change, however when the
code handling code wants to talk to the *real* structure it
will get this adjusted one (without the align), and thus
it will copy the wrong portions out from it.
the fix may be as simple as removing this from these headers
(leaving it always defined for the current defs), and making
sure that the compat headers have the right alignment (my
quick look seem ok.)
this will, obviously, need a recompile of the newer kernel.
.mrg.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+
!DSPAM:5cc6e1ad268972073613172!

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-04-29 23:30:11 UTC

And another data point:

The failures (``ifconfig -l'' and ``route monitor'') do NOT occur when
running on a 7.0 base system.

So it would seem that the problem is specifically with the compat_50
code, and was introduced between 7.0 and 8.0.

Hopefully this narrows things enough for someone familiar with the
rtsock stuff to help us make some forward progress.

Post by Paul Goyette
Some additional testing (on a -current base system) shows that the
problem is almost certainly related to compat_50 code. Using the
ifconfig from 6.0 or newer does not display the problem.
Also, the issue is probably wider than just the sysctl stuff, since
running a 5.2 version of ``route monitor'' produces no output when
adding or changing an addresss on lo0; the 6.0 version of route
monitor produces correct output.
Furthermore, previous testing show that the problem also occurs on
a 8.0 base system with 5.2 userland. (I have not tested a 7.0 base
system.)

Post by Paul Goyette
Alas, making the suggested changes does not help. Same results as
Userland and Kernel both -current with suggested changes (the diffs
# ifconfig -l
wm0 lo0
# ifconfig lo0
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
#
# chroot /chroot52 ifconfig -l
# chroot /chroot52 ifconfig lo0
#

Post by matthew green

Post by Paul Goyette
I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

i think i have a guess about the problem.
sys/net/if.h, sys/net/route.h, and sys/compat/net/if.h all
/*
* Message format for use in obtaining information about interfaces from
* sysctl and the routing socket. We need to force 64-bit alignment if we
* aren't using compatiblity definitons.
*/
#if !defined(_KERNEL) || !defined(COMPAT_RTSOCK)
#define __align64 __aligned(sizeof(uint64_t))
#else
#define __align64
#endif
struct if_msghdr {
u_short ifm_msglen __align64;
but i think this comment is wrong.
the compat structures are defined in the compat headers and
the above structure should never change, however when the
code handling code wants to talk to the *real* structure it
will get this adjusted one (without the align), and thus
it will copy the wrong portions out from it.
the fix may be as simple as removing this from these headers
(leaving it always defined for the current defs), and making
sure that the compat headers have the right alignment (my
quick look seem ok.)
this will, obviously, need a recompile of the newer kernel.
.mrg.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+
!DSPAM:5cc77f6739141961219061!

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-05-01 04:53:46 UTC

Post by Paul Goyette
The failures (``ifconfig -l'' and ``route monitor'') do NOT occur when
running on a 7.0 base system.
So it would seem that the problem is specifically with the compat_50
code, and was introduced between 7.0 and 8.0.

OK, so armed with these two data points (7.0 ==> GOOD, 8.0 ==> BAD) I
was able to run a bisect to identify the culprit.

Sources from 2019-09-21 at 10:00:00 UTC ==> GOOD
Sources from 2019-09-21 at 19:18:10 UTC ==> BAD

There are several commits during this time window, but the build was
broken for various reasons for several hours (as shown by the babylon5
test logs). The only commits that seem relevant are those which start
with the following:

Module Name: src
Committed By: roy
Date: Wed Sep 21 10:50:23 UTC 2016

Modified Files:
src/share/man/man4: route.4
src/sys/compat/common: Makefile
src/sys/compat/net: if.h route.h
src/sys/net: if.h route.h rtsock.c
src/sys/rump/net/lib/libnet: Makefile
src/sys/sys: socket.h
Added Files:
src/sys/compat/common: rtsock_70.c

Log Message:
Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and
NET_RT_IFLIST. Add compat code for old version.

Roy, can you please look into this further? Thanks!

Note that the breakage for the 5.2 version of ``ifconfig -l'' began
with this commit, yet the 5.2 version of ``route monitor'' continues
to produce "reasonable" looking results.

# chroot /chroot52 route monitor &
# ifconfig lo0 alias 1.2.3.4
RTM_ONEWADDR
got message of size 152 on Wed May 1 04:43:57 2019
RTM_ADD: Add Route: len 152, pid 463, seq 0, errno 0, flags: <UP,HOST> locks: inits:
sockaddrs: <DST,GATEWAY>
1.2.3.4 lo0

The ``route monitor'' starts failing to function correctly at some time
after 2016-09-21 19:18:10 UTC (it definitely fails as of 2017-05-27
00:00 UTC).

Post by Paul Goyette
Hopefully this narrows things enough for someone familiar with the
rtsock stuff to help us make some forward progress.

Post by Paul Goyette
Some additional testing (on a -current base system) shows that the
problem is almost certainly related to compat_50 code. Using the
ifconfig from 6.0 or newer does not display the problem.
Also, the issue is probably wider than just the sysctl stuff, since
running a 5.2 version of ``route monitor'' produces no output when
adding or changing an addresss on lo0; the 6.0 version of route
monitor produces correct output.
Furthermore, previous testing show that the problem also occurs on
a 8.0 base system with 5.2 userland. (I have not tested a 7.0 base
system.)

Post by Paul Goyette
Alas, making the suggested changes does not help. Same results as
Userland and Kernel both -current with suggested changes (the diffs
# ifconfig -l
wm0 lo0
# ifconfig lo0
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
#
# chroot /chroot52 ifconfig -l
# chroot /chroot52 ifconfig lo0
#

Post by matthew green

Post by Paul Goyette
I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

i think i have a guess about the problem.
sys/net/if.h, sys/net/route.h, and sys/compat/net/if.h all
/*
* Message format for use in obtaining information about interfaces from
* sysctl and the routing socket. We need to force 64-bit alignment if we
* aren't using compatiblity definitons.
*/
#if !defined(_KERNEL) || !defined(COMPAT_RTSOCK)
#define __align64 __aligned(sizeof(uint64_t))
#else
#define __align64
#endif
struct if_msghdr {
u_short ifm_msglen __align64;
but i think this comment is wrong.
the compat structures are defined in the compat headers and
the above structure should never change, however when the
code handling code wants to talk to the *real* structure it
will get this adjusted one (without the align), and thus
it will copy the wrong portions out from it.
the fix may be as simple as removing this from these headers
(leaving it always defined for the current defs), and making
sure that the compat headers have the right alignment (my
quick look seem ok.)
this will, obviously, need a recompile of the newer kernel.
.mrg.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+
!DSPAM:5cc78918185496256522020!

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Goyette

2019-05-02 01:57:41 UTC

Just a reminder, there are actually TWO issues where compat_50 is
broken. Both breakages occur between the 7.0 and 8.0 releases, but
at different times and for different reasons.

First, the sysctl(8) stuff used by getifaddrs(2) is broken. To
test is simple:

1. Build a release, and install it (qemu VM is fine)
2. Create a /chroot52 directory, and unpack the base.tgz
from NetBSD-5.2
3. Boot the result, login as root, and execute the command

# chroot /chroot52 ifconfig -l

4. A working system will display lo0 (and for qemu, wm0)
while a broken system displays a blank line.

As indicated earlier, this problem was introduced between 2019-09-21 at
10:00:00 UTC (working) and 2019-09-21 at 19:18:10 UTC (broken). And we
cannot get much more specific because there was some build breakage
during this 7-hour interval.

The second breakage involves the routing socket itself, and can be
reproduced with the following steps:

1. Build a release, and install it (qemu VM is fine)
2. Create a /chroot52 directory, and unpack the base.tgz
from NetBSD-5.2
3. Boot the result, login as root, and execute the commands

# chroot /chroot52 route monitor &
# ifconfig lo0 alias 1.2.3.4

4. A working system will display a couple of routine
table update messages such as

RTM_ONEWADDR
got message of size 152 on Wed May 1 04:43:57 2019
RTM_ADD: Add Route: len 152, pid 463, seq 0, errno 0, flags: <UP,HOST> locks: inits:
sockaddrs: <DST,GATEWAY>
1.2.3.4 lo0

while a broken system displays nothing.

This second breakage was introduced between 2017-04-11 at 13:50 UTC
(working) and 2017-04-11 at 14:00 UTC (broken). During that interval
there was only one commit:

Module Name: src
Committed By: roy
Date: Tue Apr 11 13:55:55 UTC 2017

Modified Files:
src/share/man/man4: route.4
src/sys/net: raw_cb.h raw_usrreq.c route.h rtsock.c

Log Message:
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential
message types.

Post by Paul Goyette

Post by Paul Goyette
The failures (``ifconfig -l'' and ``route monitor'') do NOT occur when
running on a 7.0 base system.
So it would seem that the problem is specifically with the compat_50
code, and was introduced between 7.0 and 8.0.

OK, so armed with these two data points (7.0 ==> GOOD, 8.0 ==> BAD) I
was able to run a bisect to identify the culprit.
Sources from 2019-09-21 at 10:00:00 UTC ==> GOOD
Sources from 2019-09-21 at 19:18:10 UTC ==> BAD
There are several commits during this time window, but the build was
broken for various reasons for several hours (as shown by the babylon5
test logs). The only commits that seem relevant are those which start
Module Name: src
Committed By: roy
Date: Wed Sep 21 10:50:23 UTC 2016
src/share/man/man4: route.4
src/sys/compat/common: Makefile
src/sys/compat/net: if.h route.h
src/sys/net: if.h route.h rtsock.c
src/sys/rump/net/lib/libnet: Makefile
src/sys/sys: socket.h
src/sys/compat/common: rtsock_70.c
Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and
NET_RT_IFLIST. Add compat code for old version.
Roy, can you please look into this further? Thanks!
Note that the breakage for the 5.2 version of ``ifconfig -l'' began
with this commit, yet the 5.2 version of ``route monitor'' continues
to produce "reasonable" looking results.
# chroot /chroot52 route monitor &
# ifconfig lo0 alias 1.2.3.4
RTM_ONEWADDR
got message of size 152 on Wed May 1 04:43:57 2019
sockaddrs: <DST,GATEWAY>
1.2.3.4 lo0
The ``route monitor'' starts failing to function correctly at some time
after 2016-09-21 19:18:10 UTC (it definitely fails as of 2017-05-27
00:00 UTC).

Post by Paul Goyette
Hopefully this narrows things enough for someone familiar with the
rtsock stuff to help us make some forward progress.

Post by Paul Goyette
Some additional testing (on a -current base system) shows that the
problem is almost certainly related to compat_50 code. Using the
ifconfig from 6.0 or newer does not display the problem.
Also, the issue is probably wider than just the sysctl stuff, since
running a 5.2 version of ``route monitor'' produces no output when
adding or changing an addresss on lo0; the 6.0 version of route
monitor produces correct output.
Furthermore, previous testing show that the problem also occurs on
a 8.0 base system with 5.2 userland. (I have not tested a 7.0 base
system.)

Post by Paul Goyette
Alas, making the suggested changes does not help. Same results as
Userland and Kernel both -current with suggested changes (the diffs
# ifconfig -l
wm0 lo0
# ifconfig lo0
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
#
# chroot /chroot52 ifconfig -l
# chroot /chroot52 ifconfig lo0
#

Post by matthew green

Post by Paul Goyette
I still cannot explain how things got broken between 5.2 and 8.0. I
will defer to those who are more expert in this area than am I. My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

i think i have a guess about the problem.
sys/net/if.h, sys/net/route.h, and sys/compat/net/if.h all
/*
* Message format for use in obtaining information about interfaces from
* sysctl and the routing socket. We need to force 64-bit alignment if we
* aren't using compatiblity definitons.
*/
#if !defined(_KERNEL) || !defined(COMPAT_RTSOCK)
#define __align64 __aligned(sizeof(uint64_t))
#else
#define __align64
#endif
struct if_msghdr {
u_short ifm_msglen __align64;
but i think this comment is wrong.
the compat structures are defined in the compat headers and
the above structure should never change, however when the
code handling code wants to talk to the *real* structure it
will get this adjusted one (without the align), and thus
it will copy the wrong portions out from it.
the fix may be as simple as removing this from these headers
(leaving it always defined for the current defs), and making
sure that the compat headers have the right alignment (my
quick look seem ok.)
this will, obviously, need a recompile of the newer kernel.
.mrg.

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+
!DSPAM:5cc78918185496256522020!

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
+--------------------+--------------------------+-----------------------+

+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | ***@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | ***@netbsd.org |
+--------------------+--------------------------+-----------------------+

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Robert Elz

2019-05-01 07:51:56 UTC

I have no idea if this is actually a contribiting factor here or
not, but the way we traditionally do versioning *inside the kernel)
for changes like the one which is most likely the cause here is
fraught with danger, and could easily lead to very subtle compat
errors which would be very difficult to find.

I took a look at the routing socket changes in the cvs logs a couple
of days ago, and to me it looked like the change that Paul has
called out was the likely problem -- purely as it was the only
change of this nature that fit the versioning requirements).

The practice we have when versioning an identifier (for an ioctl or
sysctl or struct, or format identified in a field in a struct) in the
kernel, is to take the name, change it, and assign the old name to
the new functionality.

That of itself is just fine. The problem is the names that we mostly
choose to to use as the replacement name for the old interface, that we
are keeping for binary compat with previous versions, is hideous and
stupid (and yes, I know the convention dates way back to actual CSRG
changes - it was started long before NetBSD).

To illustrate the potential problem, I will use a specific change
the rtsock changes to illustrate ... note I am not claiming that this
particular one is in any way associated with the actual problem here.

The change in question altered the NET_RT_IFLIST ioctl.

So, a new number was chosen to be NET_RT_IFLIST and made visible in
the header files, so all newly compiled userlevel would get the new
format with whatever the updated semantics were. That part of this
is fine, and the way it needs to be.

Inside the kernel (and visible only there) a new name was picked to
use for the old number, so old binaries can continue using the old
ABI - and provided the compat code is coded correctly, will continue
working without noticing anything has changed.

In this case (and following the ancient tradition) the name chosen
for the old version of NET_RT_IFLIST was NET_RT_OIFLIST (adding an 'O'
to mean the "old" interface list).

Then in the kernel we get (at least in the old form, following the
modularisation changes, the details might differ, but this is how it
was when this change was made ... the other changes are not relevant
to the point here)

#ifdef COMPAT_70
case NET_RT_OIFLIST:
/* code to generate/consume the old format */
break;
#endif

which is all fine. If we're seeing the NetBSD 7 version of NET_RT_IFLIST
we use the data the way that it was used in NetBSD 7, and the

case NET_RT_IFLIST:

code handles the new (NetBSD 8 and beyond) version of the sysctl (in
this case, I think).

So far so good.

The issue is that NET_RT_IFLIST had been versioned before, at least
twice before (that I can see), and each time the same kind of change
had been made - the only difference being that the COMPAT_70 in the
ifdef had been COMPAT_50 for the previous change, and COMPAT_14 for
the one before that.

What this means is that before the COMPAT_70 code was added, the name
NET_RT_OIFLIST had been used internally for the COMPAT_50 code, and
before that was added, it had been used for the COMPAT_14 code.

Each time we version this interface (or any similar ones) we always make
the name NET_RT_OIFLIST represent the immediately previous version of
the interface, and rename all the older ones. In this case, NET_RT_OIFLIST
for COMPAT_50 became NET_RT_OOIFLIST and NET_RT_OOIFLIST which had been
used for COMPAT_14 became NET_RT_OOOIFLIST.

People, this is insane! (It is also meaningless unnecessary makework
for whoever is doing the update).

All it would take, is for somewhere, anywhere, in the kernel, a single use
of the NET_RT_OIFLIST name to miss getting changed to NET_RT_OOIFLIST and
all kinds of bad things would happen. And there's no assistance at all
from the compiler, or anything else, in detecting this problem - it takes
something like der Mouse's old NetBSD 5 binary running with a NetBSD 8 (or
later) kernel, to detect a failure.

What we should be doing is something more like we do (in the exact opposite
direction, because of the different requiremnents) when versioning libc
interfaces - which is a scheme which works (that it looks ugly when people
actually see it at the lower level is a nuisance, but that can't be avoided).

So, if when the COMPAT_14 code for NET_RT_IFLIST had been added, we had
renamed the compatibility identifier NET_RT_IFLIST_14 (or something like
that) then when the COMPAT_50 code was added, it would have created
NET_RT_IFLIST_50 and the NET_RT_IFLIST_14 version would not have needed
renaming (the code it executes might need altering - that's a different
issue - and is unchanged from what we have now).

Similarly, when the COMPAT_70 version was added, we would get NET_RT_IFLIST_70
and neither NET_RT_IFLIST_14 nor NET_RT_IFLIST_50 would alter.

This means that when these names are used in other places in the kernel,
(as inside the sysctl_iflist() function in this case perhaps, if that needs
to alter) it will be plainly obvious to all who look at the code which
version of the interface is supposed to be being supported. Further, if
we don't have COMPAT_xx included for some xx (in someone's kernel or other)
then the right code will go away - and if the #ifdef's are not correct,
the compiler (or linker) will help find the problem.

The actual problem in this case might be that the COMPAT_50 (and if that,
then probably the COMPAT_14 code as well) might not have been correctly
updated to deal with the NetBSD 8 kernel's internal data (it might be still
trying to convert NetBSD 7 format messages into NetBSD 5 format for example)
but it could just as easily be that the code is OK, and all that is
going wrong is that the wrong version of it is running, as one of those
extra O's missed getting added somewhere.

In general, I wouldn't suggest going back through all of our existing
compat code and replacing the O names with new ones - too much work for
too little benefit, but in the future, when adding compat code, don't
add an O name, add a _xx name instead, and if there already were O names
from pervious versioning (since these in the old method would have all
needed updating - so the work would be the same) instead of adding new
O's in their names, change them all to the correct _xx names instead.

In this particular case however, it might help spot the problem, if we
were to go way back to the COMPAT_14 routing socket changes, and change
the O names added in that to _14 names, and then go to the COMPAT_50 changes
and change the O names added there to _50 names, and then the COMPAT_70
names added can get changed to _70 names. The process of doing this might
just make whatever didn't get correctly updated stand out more easily than
any other way, as it would force all of the changes to be examined aggain.

Note that it is much more than this one name - this change also added
RTM_ODELADDR RTM_ONEWADDR RTM_OCHGADDR (maybe more) and previous changes
had added RT_OADVANCE(a,b) PF_OROUTE RT_OROUNDUP(n) etc ... (which were
not changed ... ie: the number of O's in a name gives no clue as to which
version it is intended to be the compat code for - just how many times
the interface in question has altered over the years, which, while an
interesting statistic, isn't really very useful.)

And of course, all of this is confused even more by COMPAT_RTSOCK which
adds a whole set of X names which need to be correctly matched with the
right O names as other things change.

It is a wonder than any of this works at all, the way it has been done
(and again, this isn't anyone here's fault - way back in the ancient
past, when things were first bneing versioned this way, and there was
only the old and the new, with no expectation that the new would not
last forever into the future, this 'O' scheme looked just fine).

kre

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Martin Husemann

2019-05-01 08:36:32 UTC

Post by Robert Elz
In general, I wouldn't suggest going back through all of our existing
compat code and replacing the O names with new ones - too much work for
too little benefit, but in the future, when adding compat code, don't
add an O name, add a _xx name instead, and if there already were O names
from pervious versioning (since these in the old method would have all
needed updating - so the work would be the same) instead of adding new
O's in their names, change them all to the correct _xx names instead.

Good plan, let's do that (and actually start with this change, as you
suggested).

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mouse

2019-05-01 12:14:58 UTC

I have no idea if this is actually a contribiting factor here or not,
but the way we traditionally do versioning *inside the kernel) for
changes like the one which is most likely the cause here is fraught
with danger, [...].
[...]

I entirely agree with what kre says here, right down the line. (It is
likely armchair quarterbacking, since I won't be doing the work unless
work decides that it's worth putting my time into this, but still.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Robert Elz

2019-05-02 04:22:24 UTC

Date: Thu, 2 May 2019 09:57:41 +0800 (PST)
From: Paul Goyette <***@whooppee.com>
Message-ID: <***@speedy.whooppee.com>

| This second breakage was introduced between 2017-04-11 at 13:50 UTC
| (working) and 2017-04-11 at 14:00 UTC (broken). During that interval
| there was only one commit:
|
| Module Name: src
| Committed By: roy
| Date: Tue Apr 11 13:55:55 UTC 2017
|
| Modified Files:
| src/share/man/man4: route.4
| src/sys/net: raw_cb.h raw_usrreq.c route.h rtsock.c
|
| Log Message:
| Add RO_MSGFILTER socket option to PF_ROUTE to filter out
| un-wanted route(4) messages.

This one looks straightforward II think) ... the filter is enabled
by default, and drops everything that is not (new version) routing
message (which is all it knows how to examine).

Anything from COMPAT_50 (or earlier) will simply be discarded it
appears.

This should be easy to fix by just adding a new filter function, and
apply it when a COMPAT_50 (or older) routing socket is opened.
(It might be COMPAT_RTSOCK that really makes the difference here.)
The new function could simply OK all packets (which matches pre
NetBSD-8 behaviour) or could do the same kinds of tests as the
current one.

Alternately - only enable the filter for sockets for the new protocol,
and leave it turned off for the old ones.

How to fit that kind of change into the new modularised COMPAT_* stuff
I have no idea however.

kre

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Martin Husemann

2019-05-02 06:45:02 UTC

Post by Robert Elz
Alternately - only enable the filter for sockets for the new protocol,
and leave it turned off for the old ones.

Just leave the filter off by default and only enable it via some special
request by the app? The socket itself has no version attached, hasn't it?

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Robert Elz

2019-05-02 07:49:46 UTC

Date: Thu, 2 May 2019 08:45:02 +0200
From: Martin Husemann <***@duskware.de>
Message-ID: <***@mail.duskware.de>

| Just leave the filter off by default and only enable it via some special
| request by the app? The socket itself has no version attached, hasn't it?

They're connected to different protocol domains. route, and you guessed it,
oroute ...

kre

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

24 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Mouse 2019-04-25 15:08:26 UTC

Roy Marples 2019-04-25 16:23:10 UTC

Mouse 2019-04-26 12:36:16 UTC

Mouse 2019-04-25 17:03:26 UTC

Paul Goyette 2019-04-25 20:32:09 UTC

Mouse 2019-04-25 20:44:27 UTC

Paul Goyette 2019-04-26 08:09:24 UTC

Paul Goyette 2019-04-26 08:24:40 UTC

Mouse 2019-04-26 12:30:20 UTC

Paul Goyette 2019-04-27 07:23:39 UTC

Paul Goyette 2019-04-29 02:47:24 UTC

Paul Goyette 2019-04-29 05:44:09 UTC

Stephen Borrill 2019-04-26 11:49:57 UTC

matthew green 2019-04-29 07:10:59 UTC

Paul Goyette 2019-04-29 11:35:22 UTC

Paul Goyette 2019-04-29 22:48:51 UTC

Paul Goyette 2019-04-29 23:30:11 UTC

Paul Goyette 2019-05-01 04:53:46 UTC

Paul Goyette 2019-05-02 01:57:41 UTC

Robert Elz 2019-05-01 07:51:56 UTC

Martin Husemann 2019-05-01 08:36:32 UTC

Mouse 2019-05-01 12:14:58 UTC

Robert Elz 2019-05-02 04:22:24 UTC

Martin Husemann 2019-05-02 06:45:02 UTC

Robert Elz 2019-05-02 07:49:46 UTC

about - legalese

Loading...