Discussion:
so_rerror
(too old to reply)
Christos Zoulas
2018-11-03 15:33:29 UTC
Permalink
Hello,

Since the introduction of so_rerror tracking to detect receive
socket overflows, we have been trying to make other programs cope
by increasing their buffer sizes and avoid the error message flood.
We have taken this as far as it goes now, and still there are
pathological cases where the new behavior cannot easily be fixed
(one has to go and fix each program separately). One example is
when a program turns on debugging to syslog and logs quicker than
syslog can absorb. What happens then is that syslog and the program
keep spewing error messages about the socket overflow and do little
else...

While keeping track of receive overflows maybe desired (in the
routing socket + dhcpcd case), I think since it is a new behavior
it should be optional; programs that want to know about it should
turn it on. This patch restores the original behavior and allows
program who care about receive errors to arrange to be notified by
introducing setsockopt SO_RERROR.

Best,

christos

Index: kern/uipc_socket.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket.c,v
retrieving revision 1.265
diff -u -u -r1.265 uipc_socket.c
--- kern/uipc_socket.c 3 Sep 2018 16:29:35 -0000 1.265
+++ kern/uipc_socket.c 3 Nov 2018 15:22:19 -0000
@@ -1757,6 +1757,7 @@
case SO_OOBINLINE:
case SO_TIMESTAMP:
case SO_NOSIGPIPE:
+ case SO_RERROR:
#ifdef SO_OTIMESTAMP
case SO_OTIMESTAMP:
#endif
@@ -1958,6 +1959,7 @@
case SO_OOBINLINE:
case SO_TIMESTAMP:
case SO_NOSIGPIPE:
+ case SO_RERROR:
#ifdef SO_OTIMESTAMP
case SO_OTIMESTAMP:
#endif
Index: kern/uipc_socket2.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.132
diff -u -u -r1.132 uipc_socket2.c
--- kern/uipc_socket2.c 3 Sep 2018 16:29:35 -0000 1.132
+++ kern/uipc_socket2.c 3 Nov 2018 15:22:19 -0000
@@ -509,7 +509,8 @@
KASSERT(solocked(so));

so->so_rcv.sb_overflowed++;
- so->so_rerror = ENOBUFS;
+ if (so->so_options & SO_RERROR)
+ so->so_rerror = ENOBUFS;
sorwakeup(so);
}

===================================================================
RCS file: /cvsroot/src/sys/sys/socket.h,v
retrieving revision 1.128
diff -u -u -r1.128 socket.h
--- sys/socket.h 16 Sep 2018 20:40:20 -0000 1.128
+++ sys/socket.h 3 Nov 2018 15:22:19 -0000
@@ -132,6 +132,7 @@
#define SO_NOSIGPIPE 0x0800 /* no SIGPIPE from EPIPE */
#define SO_ACCEPTFILTER 0x1000 /* there is an accept filter */
#define SO_TIMESTAMP 0x2000 /* timestamp received dgram traffic */
+#define SO_RERROR 0x4000 /* Keep track of receive errors */


/*

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2018-11-03 15:47:19 UTC
Permalink
Post by Christos Zoulas
While keeping track of receive overflows maybe desired (in the
routing socket + dhcpcd case), I think since it is a new behavior
it should be optional; programs that want to know about it should
turn it on. This patch restores the original behavior and allows
program who care about receive errors to arrange to be notified by
introducing setsockopt SO_RERROR.
I agree with this change. The new behavior is a true binary-compatibility problem that should always have been opt-in.

If you want the ability to throw the big-switch for everyone for debugging purposes, though, you could certainly add a sysctl that, if set, causes all new sockets to get SO_RERROR enabled by default.

-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Roy Marples
2018-11-03 16:07:45 UTC
Permalink
Post by Christos Zoulas
Hello,
Since the introduction of so_rerror tracking to detect receive
socket overflows, we have been trying to make other programs cope
by increasing their buffer sizes and avoid the error message flood.
We have taken this as far as it goes now, and still there are
pathological cases where the new behavior cannot easily be fixed
(one has to go and fix each program separately). One example is
when a program turns on debugging to syslog and logs quicker than
syslog can absorb. What happens then is that syslog and the program
keep spewing error messages about the socket overflow and do little
else...
This is a double edged sword.
If there's any issue logging messages, I want to know about it.
I have no idea if the discarded message was pointless debug or someone
trying to breach my server. A good example is on one high traffic server
I have the bulk of text in /var/log/messages is from blacklistd - I
would be upset if any of that was silently discarded.
Maybe we need an option to increase the size of syslogd buffers?

Or better yet, we need a way of dynamically increasing/descreasing the
buffers in the kernel to avoid this.
Post by Christos Zoulas
While keeping track of receive overflows maybe desired (in the
routing socket + dhcpcd case), I think since it is a new behavior
it should be optional; programs that want to know about it should
turn it on. This patch restores the original behavior and allows
program who care about receive errors to arrange to be notified by
introducing setsockopt SO_RERROR.
I have no issue with each program defining the behaviour it wants.
However, I do want an option to enable it globally, like a sysctl option
so I can tell when syslogd isn't doing it's job properly - or any other
application for that matter.

If you want this new global option to default to off (which I strongly
disagree with - sweeping issues under the carpet should not be a default
option), then can we also have a man page update to describe this
behaviour please.
Post by Christos Zoulas
Best,
christos
Index: kern/uipc_socket.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket.c,v
retrieving revision 1.265
diff -u -u -r1.265 uipc_socket.c
--- kern/uipc_socket.c 3 Sep 2018 16:29:35 -0000 1.265
+++ kern/uipc_socket.c 3 Nov 2018 15:22:19 -0000
@@ -1757,6 +1757,7 @@
#ifdef SO_OTIMESTAMP
#endif
@@ -1958,6 +1959,7 @@
#ifdef SO_OTIMESTAMP
#endif
Index: kern/uipc_socket2.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.132
diff -u -u -r1.132 uipc_socket2.c
--- kern/uipc_socket2.c 3 Sep 2018 16:29:35 -0000 1.132
+++ kern/uipc_socket2.c 3 Nov 2018 15:22:19 -0000
@@ -509,7 +509,8 @@
KASSERT(solocked(so));
so->so_rcv.sb_overflowed++;
- so->so_rerror = ENOBUFS;
+ if (so->so_options & SO_RERROR)
+ so->so_rerror = ENOBUFS;
sorwakeup(so);
Waking up the socket without so->so_rerror being set is a waste of CPU.
Post by Christos Zoulas
}
===================================================================
RCS file: /cvsroot/src/sys/sys/socket.h,v
retrieving revision 1.128
diff -u -u -r1.128 socket.h
--- sys/socket.h 16 Sep 2018 20:40:20 -0000 1.128
+++ sys/socket.h 3 Nov 2018 15:22:19 -0000
@@ -132,6 +132,7 @@
#define SO_NOSIGPIPE 0x0800 /* no SIGPIPE from EPIPE */
#define SO_ACCEPTFILTER 0x1000 /* there is an accept filter */
#define SO_TIMESTAMP 0x2000 /* timestamp received dgram traffic */
+#define SO_RERROR 0x4000 /* Keep track of receive errors */
/*
Roy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2018-11-03 18:53:49 UTC
Permalink
Post by Roy Marples
Post by Christos Zoulas
Hello,
Since the introduction of so_rerror tracking to detect receive
socket overflows, we have been trying to make other programs cope
by increasing their buffer sizes and avoid the error message flood.
We have taken this as far as it goes now, and still there are
pathological cases where the new behavior cannot easily be fixed
(one has to go and fix each program separately). One example is
when a program turns on debugging to syslog and logs quicker than
syslog can absorb. What happens then is that syslog and the program
keep spewing error messages about the socket overflow and do little
else...
This is a double edged sword.
If there's any issue logging messages, I want to know about it.
I have no idea if the discarded message was pointless debug or someone
trying to breach my server. A good example is on one high traffic server
I have the bulk of text in /var/log/messages is from blacklistd - I
would be upset if any of that was silently discarded.
Maybe we need an option to increase the size of syslogd buffers?
I understand, but there is only one channel now and debugging messages
can overwhelm (and usually do) others in terms of quantity.
Post by Roy Marples
Or better yet, we need a way of dynamically increasing/descreasing the
buffers in the kernel to avoid this.
Sure that might help, but it does not help if the receiving size
keeps falling behind (and might end up with resource exhaustion in
the kernel).
Post by Roy Marples
If you want this new global option to default to off (which I strongly
disagree with - sweeping issues under the carpet should not be a default
option), then can we also have a man page update to describe this
behaviour please.
Well, it has to default to off, since we keep finding new programs we
need to fix. The new behavior is "new" and old programs are not designed
with it in mind and since most OS's don't behave like this we can expect
them to break (we just modified BIND)...

Here's a new patch that includes the global sysctl, and the fix to
not call sorwakeup (thanks!)

christos

Index: sys/socket.h
===================================================================
RCS file: /cvsroot/src/sys/sys/socket.h,v
retrieving revision 1.128
diff -u -u -r1.128 socket.h
--- sys/socket.h 16 Sep 2018 20:40:20 -0000 1.128
+++ sys/socket.h 3 Nov 2018 18:51:15 -0000
@@ -132,6 +132,7 @@
#define SO_NOSIGPIPE 0x0800 /* no SIGPIPE from EPIPE */
#define SO_ACCEPTFILTER 0x1000 /* there is an accept filter */
#define SO_TIMESTAMP 0x2000 /* timestamp received dgram traffic */
+#define SO_RERROR 0x4000 /* Keep track of receive errors */


/*
Index: kern/uipc_socket.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket.c,v
retrieving revision 1.265
diff -u -u -r1.265 uipc_socket.c
--- kern/uipc_socket.c 3 Sep 2018 16:29:35 -0000 1.265
+++ kern/uipc_socket.c 3 Nov 2018 18:51:15 -0000
@@ -118,6 +118,7 @@

extern const struct fileops socketops;

+static int sooptions;
extern int somaxconn; /* patchable (XXX sysctl) */
int somaxconn = SOMAXCONN;
kmutex_t *softnet_lock;
@@ -537,6 +538,7 @@
so->so_proto = prp;
so->so_send = sosend;
so->so_receive = soreceive;
+ so->so_options = sooptions;
#ifdef MBUFTRACE
so->so_rcv.sb_mowner = &prp->pr_domain->dom_mowner;
so->so_snd.sb_mowner = &prp->pr_domain->dom_mowner;
@@ -1757,6 +1759,7 @@
case SO_OOBINLINE:
case SO_TIMESTAMP:
case SO_NOSIGPIPE:
+ case SO_RERROR:
#ifdef SO_OTIMESTAMP
case SO_OTIMESTAMP:
#endif
@@ -1958,6 +1961,7 @@
case SO_OOBINLINE:
case SO_TIMESTAMP:
case SO_NOSIGPIPE:
+ case SO_RERROR:
#ifdef SO_OTIMESTAMP
case SO_OTIMESTAMP:
#endif
@@ -2542,4 +2546,11 @@
SYSCTL_DESCR("Maximum socket buffer size"),
sysctl_kern_sbmax, 0, NULL, 0,
CTL_KERN, KERN_SBMAX, CTL_EOL);
+
+ sysctl_createv(&socket_sysctllog, 0, NULL, NULL,
+ CTLFLAG_PERMANENT|CTLFLAG_READWRITE,
+ CTLTYPE_INT, "sooptions",
+ SYSCTL_DESCR("Default socket options"),
+ NULL, 0, &sooptions, 0,
+ CTL_KERN, CTL_CREATE, CTL_EOL);
}
Index: kern/uipc_socket2.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.132
diff -u -u -r1.132 uipc_socket2.c
--- kern/uipc_socket2.c 3 Sep 2018 16:29:35 -0000 1.132
+++ kern/uipc_socket2.c 3 Nov 2018 18:51:16 -0000
@@ -509,8 +509,10 @@
KASSERT(solocked(so));

so->so_rcv.sb_overflowed++;
- so->so_rerror = ENOBUFS;
- sorwakeup(so);
+ if (so->so_options & SO_RERROR) {
+ so->so_rerror = ENOBUFS;
+ sorwakeup(so);
+ }
}

/*


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2018-11-03 19:23:23 UTC
Permalink
Post by Christos Zoulas
@@ -537,6 +538,7 @@
so->so_proto = prp;
so->so_send = sosend;
so->so_receive = soreceive;
+ so->so_options = sooptions;
#ifdef MBUFTRACE
so->so_rcv.sb_mowner = &prp->pr_domain->dom_mowner;
so->so_snd.sb_mowner = &prp->pr_domain->dom_mowner;
I think we need to mask sooptions here or with an accessor function
when using the sysctl to write the value. It is likely not a good idea
to set e.g. "there is an accept filter running" to on when this is a lie.

Maybe we should instead do a boolean sysctl and just set the single bit
in sooptions?

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2018-11-03 20:19:26 UTC
Permalink
On Nov 3, 8:23pm, ***@duskware.de (Martin Husemann) wrote:
-- Subject: Re: so_rerror

| On Sat, Nov 03, 2018 at 06:53:49PM +0000, Christos Zoulas wrote:
|
| > @@ -537,6 +538,7 @@
| > so->so_proto = prp;
| > so->so_send = sosend;
| > so->so_receive = soreceive;
| > + so->so_options = sooptions;
| > #ifdef MBUFTRACE
| > so->so_rcv.sb_mowner = &prp->pr_domain->dom_mowner;
| > so->so_snd.sb_mowner = &prp->pr_domain->dom_mowner;
|
|
| I think we need to mask sooptions here or with an accessor function
| when using the sysctl to write the value. It is likely not a good idea
| to set e.g. "there is an accept filter running" to on when this is a lie.
|
| Maybe we should instead do a boolean sysctl and just set the single bit
| in sooptions?

I know, I thought about that but then I decided it was simpler and
more functional to allow everything. I guess I can filter out the
ones that don't make sense... Stay tuned.

christos

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2018-11-03 21:16:30 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: so_rerror
|
| > so->so_proto = prp;
| > so->so_send = sosend;
| > so->so_receive = soreceive;
| > + so->so_options = sooptions;
| > #ifdef MBUFTRACE
| > so->so_rcv.sb_mowner = &prp->pr_domain->dom_mowner;
| > so->so_snd.sb_mowner = &prp->pr_domain->dom_mowner;
|
|
| I think we need to mask sooptions here or with an accessor function
| when using the sysctl to write the value. It is likely not a good idea
| to set e.g. "there is an accept filter running" to on when this is a lie.
|
| Maybe we should instead do a boolean sysctl and just set the single bit
| in sooptions?
I know, I thought about that but then I decided it was simpler and
more functional to allow everything. I guess I can filter out the
ones that don't make sense... Stay tuned.
Here's a patch that only allows the options that could be useful.

christos

Index: kern/uipc_socket.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket.c,v
retrieving revision 1.265
diff -u -u -r1.265 uipc_socket.c
--- kern/uipc_socket.c 3 Sep 2018 16:29:35 -0000 1.265
+++ kern/uipc_socket.c 3 Nov 2018 21:14:53 -0000
@@ -118,6 +118,7 @@

extern const struct fileops socketops;

+static int sooptions;
extern int somaxconn; /* patchable (XXX sysctl) */
int somaxconn = SOMAXCONN;
kmutex_t *softnet_lock;
@@ -537,6 +538,7 @@
so->so_proto = prp;
so->so_send = sosend;
so->so_receive = soreceive;
+ so->so_options = sooptions;
#ifdef MBUFTRACE
so->so_rcv.sb_mowner = &prp->pr_domain->dom_mowner;
so->so_snd.sb_mowner = &prp->pr_domain->dom_mowner;
@@ -1757,6 +1759,7 @@
case SO_OOBINLINE:
case SO_TIMESTAMP:
case SO_NOSIGPIPE:
+ case SO_RERROR:
#ifdef SO_OTIMESTAMP
case SO_OTIMESTAMP:
#endif
@@ -1958,6 +1961,7 @@
case SO_OOBINLINE:
case SO_TIMESTAMP:
case SO_NOSIGPIPE:
+ case SO_RERROR:
#ifdef SO_OTIMESTAMP
case SO_OTIMESTAMP:
#endif
@@ -2522,6 +2526,31 @@
return (error);
}

+/*
+ * sysctl helper routine for kern.sooptions. Ensures that only allowed
+ * options can be set.
+ */
+static int
+sysctl_kern_sooptions(SYSCTLFN_ARGS)
+{
+ int error, new_options;
+ struct sysctlnode node;
+
+ new_options = sooptions;
+ node = *rnode;
+ node.sysctl_data = &new_options;
+ error = sysctl_lookup(SYSCTLFN_CALL(&node));
+ if (error || newp == NULL)
+ return error;
+
+ if (new_options & ~SO_DEFOPTS)
+ return EINVAL;
+
+ sooptions = new_options;
+
+ return 0;
+}
+
static void
sysctl_kern_socket_setup(void)
{
@@ -2542,4 +2571,11 @@
SYSCTL_DESCR("Maximum socket buffer size"),
sysctl_kern_sbmax, 0, NULL, 0,
CTL_KERN, KERN_SBMAX, CTL_EOL);
+
+ sysctl_createv(&socket_sysctllog, 0, NULL, NULL,
+ CTLFLAG_PERMANENT|CTLFLAG_READWRITE,
+ CTLTYPE_INT, "sooptions",
+ SYSCTL_DESCR("Default socket options"),
+ sysctl_kern_sooptions, 0, NULL, 0,
+ CTL_KERN, CTL_CREATE, CTL_EOL);
}
Index: kern/uipc_socket2.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.132
diff -u -u -r1.132 uipc_socket2.c
--- kern/uipc_socket2.c 3 Sep 2018 16:29:35 -0000 1.132
+++ kern/uipc_socket2.c 3 Nov 2018 21:14:53 -0000
@@ -509,8 +509,10 @@
KASSERT(solocked(so));

so->so_rcv.sb_overflowed++;
- so->so_rerror = ENOBUFS;
- sorwakeup(so);
+ if (so->so_options & SO_RERROR) {
+ so->so_rerror = ENOBUFS;
+ sorwakeup(so);
+ }
}

/*
Index: sys/socket.h
===================================================================
RCS file: /cvsroot/src/sys/sys/socket.h,v
retrieving revision 1.128
diff -u -u -r1.128 socket.h
--- sys/socket.h 16 Sep 2018 20:40:20 -0000 1.128
+++ sys/socket.h 3 Nov 2018 21:14:53 -0000
@@ -132,7 +132,30 @@
#define SO_NOSIGPIPE 0x0800 /* no SIGPIPE from EPIPE */
#define SO_ACCEPTFILTER 0x1000 /* there is an accept filter */
#define SO_TIMESTAMP 0x2000 /* timestamp received dgram traffic */
+#define SO_RERROR 0x4000 /* Keep track of receive errors */

+/* Allowed default option flags */
+#define SO_DEFOPTS (SO_DEBUG|SO_REUSEADDR|SO_KEEPALIVE|SO_DONTROUTE| \
+ SO_BROADCAST|SO_USELOOPBACK|SO_LINGER|SO_OOBINLINE|SO_REUSEPORT| \
+ SO_NOSIGPIPE|SO_TIMESTAMP|SO_RERROR)
+
+#define __SO_OPTION_BITS \
+ "\20" \
+ "\1SO_DEBUG" \
+ "\2SO_ACCEPTCONN" \
+ "\3SO_REUSEADDR" \
+ "\4SO_KEEPALIVE" \
+ "\5SO_DONTROUTE" \
+ "\6SO_BROADCAST" \
+ "\7SO_USELOOPBACK" \
+ "\10SO_LINGER" \
+ "\11SO_OOBINLINE" \
+ "\12SO_REUSEPORT" \
+ "\13SO_OTIMESTAMP" \
+ "\14SO_NOSIGPIPE" \
+ "\15SO_ACCEPTFILTER" \
+ "\16SO_TIMESTAMP" \
+ "\17SO_RERROR"

/*
* Additional options, not kept in so_options.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2018-11-03 19:26:30 UTC
Permalink
Post by Martin Husemann
I think we need to mask sooptions here or with an accessor function
when using the sysctl to write the value. It is likely not a good idea
to set e.g. "there is an accept filter running" to on when this is a lie.
Maybe we should instead do a boolean sysctl and just set the single bit
in sooptions?
Agreed -- I think it's a bad idea so simply have blanket set of default socket options. It's better to break this one out as a separate boolean.

-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2018-11-03 21:52:52 UTC
Permalink
On Nov 3, 12:26pm, ***@me.com (Jason Thorpe) wrote:
-- Subject: Re: so_rerror

| Agreed -- I think it's a bad idea so simply have blanket set of default
| socket options. It's better to break this one out as a separate boolean.

Well, I've restricted it now to the ones that can be set... For example
it would be nice for debugging to be able to set SO_DEBUG or SO_REUSEADDR
just to see how things behave. I am just providing rope...

christos

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Roy Marples
2018-11-03 23:39:23 UTC
Permalink
Post by Christos Zoulas
Post by Roy Marples
If you want this new global option to default to off (which I strongly
disagree with - sweeping issues under the carpet should not be a default
option), then can we also have a man page update to describe this
behaviour please.
Well, it has to default to off, since we keep finding new programs we
need to fix.
From what ships with NetBSD I think only bind we missed and now fixed?
So that just leaves syslogd as the outlier which you have constantly
complaied about. I know that other logging systems set the receive
buffer to the biggest they can and supply options to set a runtime size
- both options would surely help here.

I've not seen or heard about any other programs that need addressing,
other than the various ATF test cases that have been added. Can you give
examples please?
Post by Christos Zoulas
The new behavior is "new" and old programs are not designed
with it in mind and since most OS's don't behave like this we can expect
them to break (we just modified BIND)...
bind was excplicity designed to handle this prior to my change, however
the bind design wasn't great and just closed the socket.
recv returning ENOBUFS has been documented by POSIX many years before my
change to NetBSD so it can be argued that not handling it gracefully is
not standards compliant.

Using the same logic you put forward we should set
security.pax.mprotect.enabled=0 because I have a list of programs that
don't work with it and when trying to poke people to get it to work they
said just disable that option.

Roy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2018-11-04 00:49:31 UTC
Permalink
On Nov 3, 11:39pm, ***@marples.name (Roy Marples) wrote:
-- Subject: Re: so_rerror

| From what ships with NetBSD I think only bind we missed and now fixed?
| So that just leaves syslogd as the outlier which you have constantly
| complaied about. I know that other logging systems set the receive
| buffer to the biggest they can and supply options to set a runtime size
| - both options would surely help here.

Well, I did not find bind, but with syslogd the situation needs
different handling; i.e. don't even bother to log if you are dropping
on receive because you (syslogd) are making things worse.

| I've not seen or heard about any other programs that need addressing,
| other than the various ATF test cases that have been added. Can you give
| examples please?

I have not seen any.

| bind was excplicity designed to handle this prior to my change, however
| the bind design wasn't great and just closed the socket.
| recv returning ENOBUFS has been documented by POSIX many years before my
| change to NetBSD so it can be argued that not handling it gracefully is
| not standards compliant.

Well, I think that the bind behavior was a bug. It is just that
the authors did not expect ENOBUFS on recv, so they thought if it
happened it was fatal (needed resetting the socket).

| Using the same logic you put forward we should set
| security.pax.mprotect.enabled=0 because I have a list of programs that
| don't work with it and when trying to poke people to get it to work they
| said just disable that option.

With 2 major differences:

1. pax-mprotect protects the system from random programs that misuse
mmap; the class of programs that breaks is small and known (jit stuff);
the majority of people think that the default should be on. And finally
there is a sysctl to choose... Until I commit the code, the new behavior
for sockets is mandatory.
2. so_rerror affects random programs (we don't know which ones but any
one using sockets can be affected; this set is much larger than the
set of programs that mmap +x). The majoriry of people think that the
default behavior should be to ignore the error.

The straw that broke my back was that I had to change stuff around
so that my timemachine backups could work again. I.e. the change
actually broke things (by making the the backup bandwidth effectively 0).

Let's remember that I was not initially against the change and I
tried for a long time to fix whatever broke... And I was the one
that increased buffers for many things. It is just that increasing
buffers does not fix the problem in pathological cases, and also
wastes resources.

Nevertheless now everyone can have it the way the like... There is
a sysctl to turn it on globally and a per-socket setsockopt to override.

christos

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Roy Marples
2018-11-04 01:56:26 UTC
Permalink
Post by Christos Zoulas
| Using the same logic you put forward we should set
| security.pax.mprotect.enabled=0 because I have a list of programs that
| don't work with it and when trying to poke people to get it to work they
| said just disable that option.
1. pax-mprotect protects the system from random programs that misuse
mmap; the class of programs that breaks is small and known (jit stuff);
the majority of people think that the default should be on. And finally
there is a sysctl to choose... Until I commit the code, the new behavior
for sockets is mandatory.
I don't have a single NetBSD system where I can turn this option on (for
example, it's not found on ERLITE) and have all the programs I need to
run on it actually work.

It's good to know that the only machine I have where the default out of
the box config actually works or it's intended use is my ERLITE. That
can't be said of my x86 or amd64 platforms.
Post by Christos Zoulas
2. so_rerror affects random programs (we don't know which ones but any
one using sockets can be affected; this set is much larger than the
set of programs that mmap +x). The majoriry of people think that the
default behavior should be to ignore the error.
The straw that broke my back was that I had to change stuff around
so that my timemachine backups could work again. I.e. the change
actually broke things (by making the the backup bandwidth effectively 0).
Can you explain how it was broken and what do you to make it work again?
Post by Christos Zoulas
Let's remember that I was not initially against the change and I
tried for a long time to fix whatever broke... And I was the one
that increased buffers for many things. It is just that increasing
buffers does not fix the problem in pathological cases, and also
wastes resources.
Which is why we need a better solution than what we have.
dynamically increasing/decreasing buffer size is a good solution for
this, which should make everyone happy.
Post by Christos Zoulas
Nevertheless now everyone can have it the way the like... There is
a sysctl to turn it on globally and a per-socket setsockopt to override.
And we want a secure system where a lot of useful programs don't run and
sweeps overflow issues under the carpet by default? Not me!

Roy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2018-11-04 08:44:58 UTC
Permalink
Post by Roy Marples
Post by Christos Zoulas
1. pax-mprotect protects the system from random programs that misuse
mmap; the class of programs that breaks is small and known (jit stuff);
the majority of people think that the default should be on. And finally
there is a sysctl to choose... Until I commit the code, the new behavior
for sockets is mandatory.
I don't have a single NetBSD system where I can turn this option on (for
example, it's not found on ERLITE) and have all the programs I need to run
on it actually work.
I am not sure I can parse this correctly. With "this option" you mean PAX
mprotect?

The ERLITE kernel has none of the PAX kernel options, so at runtime
there is no option to turn it on or off - it is always off.

If you see other programs break with it, file a pkgsrc PR (assuming you
did install it from pkgsrc).

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Roy Marples
2018-11-04 11:32:28 UTC
Permalink
Post by Martin Husemann
Post by Roy Marples
Post by Christos Zoulas
1. pax-mprotect protects the system from random programs that misuse
mmap; the class of programs that breaks is small and known (jit stuff);
the majority of people think that the default should be on. And finally
there is a sysctl to choose... Until I commit the code, the new behavior
for sockets is mandatory.
I don't have a single NetBSD system where I can turn this option on (for
example, it's not found on ERLITE) and have all the programs I need to run
on it actually work.
I am not sure I can parse this correctly. With "this option" you mean PAX
mprotect?
Yes
Post by Martin Husemann
The ERLITE kernel has none of the PAX kernel options, so at runtime
there is no option to turn it on or off - it is always off.
Exactly. So it works out of the box for me.
Post by Martin Husemann
If you see other programs break with it, file a pkgsrc PR (assuming you
did install it from pkgsrc).
I'll see if I can get around to that in the coming week.

Roy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2018-11-04 00:58:56 UTC
Permalink
Date: Sat, 3 Nov 2018 23:39:23 +0000
From: Roy Marples <***@marples.name>
Message-ID: <80821341-dc28-157a-f27c-***@marples.name>

| I've not seen or heard about any other programs that need addressing,
| other than the various ATF test cases that have been added. Can you give
| examples please?

I think kern/53683 might be another example.

| Using the same logic you put forward we should set
| security.pax.mprotect.enabled=0

I would actually have no problem with that. That's what I do...

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2018-11-04 05:44:26 UTC
Permalink
Date: Sun, 4 Nov 2018 01:40:10 +0000
From: Roy Marples <***@marples.name>
Message-ID: <058dd223-f0f1-4294-cea5-***@marples.name>

| AFAIK TCP doesn't call so_roverflow and the reporter mentions using UDP
| as a workaround so I doubt this is the case.

I haven't looked into it, and it could be as hinted, a driver bug, but the
ENOBUFS error (recently occurring) is kind of a red flag.

I know a normal TCP would not (should not) be affected by this, but the
in-kernel NFS has access (whether it should or not) to data not normally
exposed to TCP clients.

The UDP workaround was reported as still seeing the errors, they were
just not causing a hang ... which is not unexpected, udp packets are
lost all the time, and the code needs to be able to retry and cope.
TCP on the other hand normallys ees no errrors, other than connection
lost, and it is not beyond all possibility that if NFS detects an error,
and TCP has no way to "fix" it, than things could hang.

mrg - can you apply Christos patch, and see if it helps your tcp NFS
mounts (and perhaps makes the error reports in the UDP case go away) ?

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
matthew green
2018-11-04 22:24:24 UTC
Permalink
Post by Robert Elz
mrg - can you apply Christos patch, and see if it helps your tcp NFS
mounts (and perhaps makes the error reports in the UDP case go away) ?
i see the patch went in the tree. i'll try sometime soon (i'm busy
dealing with gcc right now..)

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Roy Marples
2018-11-04 01:40:10 UTC
Permalink
Post by Robert Elz
Date: Sat, 3 Nov 2018 23:39:23 +0000
| I've not seen or heard about any other programs that need addressing,
| other than the various ATF test cases that have been added. Can you give
| examples please?
I think kern/53683 might be another example.
AFAIK TCP doesn't call so_roverflow and the reporter mentions using UDP
as a workaround so I doubt this is the case.
Post by Robert Elz
| Using the same logic you put forward we should set
| security.pax.mprotect.enabled=0
I would actually have no problem with that. That's what I do...
kre
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2018-11-03 18:40:51 UTC
Permalink
?I absolutely agree with this change, and if I had found the time, would
have done it myself.

I also absolutely disagree with any global (turn it on by default) option,
that will just result in needing to go modify all the programs that don't
want it to turn ot off, just in case the system managed enables the
option. No thanks.

Better to modify the (very few) programs where this behaviour is useful
to turn it on, and those for which some people might rationally desire
it on to have an option to do that (syslogd might be one such, though
I still see almost no utili8ty in being told that a message from no-one
knows where was lost - in syslogd that will only happen when there's
a message flood, and when that is happening, the most likely and
obvious source to assume lost a message, with a good chance of
being correct, is the one sending large numbers of messages. The
lots of messages is the issue, not the one of them that was lost. And
if it happened that the message that lost out was a different one, then
how would anyone ever know? And why would anyone even assume
that it might have happened?)


kre



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...