Dmitry Matveev
2011-02-17 07:58:10 UTC
Hello,
I have discovered a bug (at least it looks like a bug) in NetBSD's
socket layer.
I use NetBSD 5.0.2, i386 port. Please see the detailed info below (I
have to say
that I am new to NetBSD and to kernel development as well, so there
could be some
wrong assumptions in the text -- I will be happy if someone will
correct me).
Background information
----------------------
Suppose there is a single-threaded server. It uses SIGIO to maintain
incoming
connections and data from clients.
When the process receives SIGIO:
* if there are pending connections on the server socket, the process
calls
accept(2) and then adds an accepted client socket to a list.
* if there is any data available on any client socket (it is
determined via
poll(2)), server processes this data.
Problem description
-------------------
Server process does not receive SIGIO on incoming data on the client
socket.
Problem analysis
----------------
A process will receive SIGIO on incoming data if the owner PID and
O_ASYNC
option are set for the socket. The debugging has shown that the client
socket
does not have SB_ASYNC bit set in its so_rcv.sb_flags, so SIGIO is not
emitted
in sowakeup().
However the process sets the required options this way:
int oldflags = fcntl (sock, F_GETFL, 0);
if (!(oldflags & O_ASYNC)) {
if (-1 == fcntl (sock, F_SETFL, oldflags |= O_ASYNC)) {
exit (1);
}
}
i.e. if there is no O_ASYNC, set it. But the issue is that *there was
O_ASYNC in
the `oldflags`*! So we get in the situation when we have O_ASYNC but do
not have
SB_ASYNC.
How it could happen? sys_fcntl() returns (fp->f_flag - 1) value to a
process
when it asks socket flags via fcntl with F_GETFL.
The value of `fp->f_flag` for client's socket is copied from server's
`fp->f_flag` in do_sys_accept(). Note that server socket is
non-blocking and its
flags contain all the necessary bits. But in sonewconn():
so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;
This code sets SB_AUTOSIZE bit at client's `so_rcv.sb_flags`, if this
bit is set
at server's flags. Otherwise the value is 0. Here we lost all other
flags,
including SB_ASYNC. I think that it is the bug.
Possible solution
-----------------
Since do_sys_accept() copies `f_flag` value from server's fp to
client's fp and
there is many data copied from server socket to client socket in
sonewconn(),
I think it would be right to do
so->so_rcv.sb_flags = head->so_rcv.sb_flags;
so->so_snd.sb_flags = head->so_snd.sb_flags;
instead of
so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;
so->so_snd.sb_flags |= head->so_snd.sb_flags & SB_AUTOSIZE;
in the sonewconn(). In this case we get socket's `sb_flags` and it's
fp's f_flag
in sync.
Tests
-----
A patched kernel seems to work well. I have tested it with opensshd and
GNU
Smalltalk's Swazoo application server.
A simple test application is also attached. It waits for an incoming
connection
on port 8000, then waits for incoming data from the client and then
terminates.
It can be built in two modes: with blocking and non-blocking server
socket.
Blocking version: gcc -o test_sync onc.c
Non-blocking version: gcc -o test_async onc.c -DV_ASYNC
To perform a test, start an appication, i.e.:
$ test_async &
connect with telnet to it:
$ telnet localhost 8000
type something and hit Enter.
Blocking version works fine without a patch, because initally server fp
flags do
not contain O_ASYNC and application is able to set it.
Non-blocking version works correctly only with the patched kernel.
Both versions work fine on GNU/Linux (but with SIGPOLL instead of
SIGIO).
--------------
Best regards,
Dmitry Matveev
I have discovered a bug (at least it looks like a bug) in NetBSD's
socket layer.
I use NetBSD 5.0.2, i386 port. Please see the detailed info below (I
have to say
that I am new to NetBSD and to kernel development as well, so there
could be some
wrong assumptions in the text -- I will be happy if someone will
correct me).
Background information
----------------------
Suppose there is a single-threaded server. It uses SIGIO to maintain
incoming
connections and data from clients.
When the process receives SIGIO:
* if there are pending connections on the server socket, the process
calls
accept(2) and then adds an accepted client socket to a list.
* if there is any data available on any client socket (it is
determined via
poll(2)), server processes this data.
Problem description
-------------------
Server process does not receive SIGIO on incoming data on the client
socket.
Problem analysis
----------------
A process will receive SIGIO on incoming data if the owner PID and
O_ASYNC
option are set for the socket. The debugging has shown that the client
socket
does not have SB_ASYNC bit set in its so_rcv.sb_flags, so SIGIO is not
emitted
in sowakeup().
However the process sets the required options this way:
int oldflags = fcntl (sock, F_GETFL, 0);
if (!(oldflags & O_ASYNC)) {
if (-1 == fcntl (sock, F_SETFL, oldflags |= O_ASYNC)) {
exit (1);
}
}
i.e. if there is no O_ASYNC, set it. But the issue is that *there was
O_ASYNC in
the `oldflags`*! So we get in the situation when we have O_ASYNC but do
not have
SB_ASYNC.
How it could happen? sys_fcntl() returns (fp->f_flag - 1) value to a
process
when it asks socket flags via fcntl with F_GETFL.
The value of `fp->f_flag` for client's socket is copied from server's
`fp->f_flag` in do_sys_accept(). Note that server socket is
non-blocking and its
flags contain all the necessary bits. But in sonewconn():
so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;
This code sets SB_AUTOSIZE bit at client's `so_rcv.sb_flags`, if this
bit is set
at server's flags. Otherwise the value is 0. Here we lost all other
flags,
including SB_ASYNC. I think that it is the bug.
Possible solution
-----------------
Since do_sys_accept() copies `f_flag` value from server's fp to
client's fp and
there is many data copied from server socket to client socket in
sonewconn(),
I think it would be right to do
so->so_rcv.sb_flags = head->so_rcv.sb_flags;
so->so_snd.sb_flags = head->so_snd.sb_flags;
instead of
so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;
so->so_snd.sb_flags |= head->so_snd.sb_flags & SB_AUTOSIZE;
in the sonewconn(). In this case we get socket's `sb_flags` and it's
fp's f_flag
in sync.
Tests
-----
A patched kernel seems to work well. I have tested it with opensshd and
GNU
Smalltalk's Swazoo application server.
A simple test application is also attached. It waits for an incoming
connection
on port 8000, then waits for incoming data from the client and then
terminates.
It can be built in two modes: with blocking and non-blocking server
socket.
Blocking version: gcc -o test_sync onc.c
Non-blocking version: gcc -o test_async onc.c -DV_ASYNC
To perform a test, start an appication, i.e.:
$ test_async &
connect with telnet to it:
$ telnet localhost 8000
type something and hit Enter.
Blocking version works fine without a patch, because initally server fp
flags do
not contain O_ASYNC and application is able to set it.
Non-blocking version works correctly only with the patched kernel.
Both versions work fine on GNU/Linux (but with SIGPOLL instead of
SIGIO).
--------------
Best regards,
Dmitry Matveev