kern/53962: npf: weird 'stateful' behavior

Discussion:

(too old to reply)

f***@gmail.com

2019-02-10 02:03:14 UTC

| procedure "log" {
| log: npflog0
| }
|
| group "net1" on wm1 {
| pass stateful in final proto tcp from 192.168.1.0/24 to 192.168.2.0/24 port 22 apply "log"
| block all apply "log"
| }
|
| group "net2" on wm2 {
| pass stateful out final proto tcp from 192.168.1.0/24 to 192.168.2.13 port 22 apply "log"
| block all apply "log"
| }
|
| group default {
| pass final on lo0 all
| block all apply "log"
| }
$ nc 192.168.2.13 22 #run on 192.168.1.14
| 00:36:52.006564 rule 5.rules.0/0(match): pass in on wm1: 192.168.1.14.42718 > 192.168.2.13.22: Flags [S], seq 2274723816, win 29200, options [mss 1460,sackOK,TS val 561131037 ecr 0,nop,wscale 7], length 0
| 00:36:52.006639 rule 8.rules.0/0(match): pass out on wm2: 192.168.1.14.42718 > 192.168.2.13.22: Flags [S], seq 2274723816, win 29200, options [mss 1460,sackOK,TS val 561131037 ecr 0,nop,wscale 7], length 0
| 00:36:52.007299 rule 9.rules.0/0(match): block in on wm2: 192.168.2.13.22 > 192.168.1.14.42718: Flags [S.], seq 283128975, ack 2274723817, win 28960, options [mss 1460,sackOK,TS val 4155176782 ecr 561131037,nop,wscale 6], length 0
Now the router is blocking the SYN/ACK -- why? The rule was 'stateful' and therefore state should've been kept, no?

Ok what seems to be going on is twofold:

1)

Whenever a packet arrives, npf tries to retrieve, from the 'connection db', a connection that relates to that packet. The key that this lookup is done with, i.e. what identifies this connection, is derived (in the case of TCP) from (src/dst port, src/dst addr, protocol). (Note that 'interface' is absent from that list.). The result of this lookup is a 'connection' object which keeps the connection's state. (cf. npf_conn_conkey() and connkey_setkey() in sys/net/npf/npf_conn.c)

It follows that it's impossible to keep state on two connections that only differ by interface. The connection objects can represent it, but the keys they'd be stored with in the connection db would collide. (The semantics of the connection db are to refuse to insert a key that already exists, rather than overwriting it.).

That means (looking at the above npf.conf) that an ingressing packet on wm1 will create a connection to keep state on; then upon egress of wm2 there's technically another connection to keep state on (same parameters, different interface), but the latter connection fails to be inserted into the connection db because of a key collision with the former.

2)

npf considers a connection to have a "direction" (i.e. the direction of the initial SYN), and essentially assumes that a "forwards" packet will only ever INgress on an interface, and a "backwards" packet will only ever Egress from an interface (or the other way around, depending on whether the SYN in- or egressed). This assumption is obviously not true on, say, a router, where one and the same packet may ingress on one interface, and egress out on another. The piece of code that does this is in npf_conn_ok() in sys/net/npf/npf_conn.c:

| /*
| * npf_conn_ok: check if the connection is active and has the right direction.
| */
| static bool
| npf_conn_ok(const npf_conn_t *con, const int di, bool forw) //di=2 forw=1
| {
| const uint32_t flags = con->c_flags;
|
| /* Check if connection is active and not expired. */
| bool ok = (flags & (CONN_ACTIVE | CONN_EXPIRE)) == CONN_ACTIVE;
| if (__predict_false(!ok)) {
| return false;
| }
|
all good until here, but now...: ('di' is direction, flags is either 1 (ingress) or 2 (egress), PFIL_ALL is 3)
| /* Check if the direction is consistent */
| bool pforw = (flags & PFIL_ALL) == (unsigned)di;
| if (__predict_false(forw != pforw)) {
| return false;
| }
| return true;
| }

When commenting out that last check, the connection is later still discarded for having the wrong interface (as it should). But since the connection is tied to a particular interface, that interface should've been part of the key to the connection db in the first place. However I realize that if that were the cae, it'd be difficult to implement interface-agnostic state ("stateful-ends").

Speaking of stateful-ends, when the direction-check is commented out, a single 'stateful-ends' ingress rule gives me exactly the good old ipf "keep state" behavior (if the packet is accepted into the filter, it's implicitly permitted out of the filter, on whatever interface). So that's a workaround I can live with for now, although of course I'm not entirely sure of the purpose of this direction check, or the consequences of removing it.

Any insights?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas Rasiukevicius

2019-02-17 01:13:13 UTC

Permalink

Post by f***@gmail.com
<...>
Speaking of stateful-ends, when the direction-check is commented out, a
single 'stateful-ends' ingress rule gives me exactly the good old ipf
"keep state" behavior (if the packet is accepted into the filter, it's
implicitly permitted out of the filter, on whatever interface). So
that's a workaround I can live with for now, although of course I'm not
entirely sure of the purpose of this direction check, or the consequences
of removing it.
Any insights?

Thanks for a thorough summary. Your (1) and (2) observations are correct.
Basically, there are two points here:

- NPF connection state is generally per-interface, but see below. Bypassing
the ruleset on other interfaces can have security implications, e.g. a packet
with a spoofed IP address might bypass ingress filtering. Hence the design
decision to default to such behaviour (so you control what's happening on
other interfaces with a ruleset there).

- There are two keys for a connection (so that the reverse lookup on the
returning packets would succeed). It is necessary to establish the packet
direction (with the respect to connection direction) for the full TCP state
tracking.

The "stateful-ends" mechanism is for having a global state (which could be
picked up on other interfaces). I think it should be fixed to assume that
the packets on interface different than where the state was created should
match the reverse key (for the "backwards stream"), without checking that
it has the opposite interface-level direction.

I'll have a look at this.

--
Mindaugas

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Edgar Fuß

2019-02-17 11:54:15 UTC

Permalink

A Timo knows, I'm running NetBSD in production.

I run a "one VLAN per IP range" (minus external, of course) policy.

I'm using packet filtering (currently ipf on 6.1) both on individual servers
(anti-spoofing, access restriction to certain deamon ports) and on the gateway
(the only machine with IP forwarding enabled) to restrict inter-network
traffic. From the ipf bugs I run into, I conclude I'm the only person on
the planet doing this.

I can think of two filter options that would make my life easier on the GW:
1. On an ingress rule, "if you see this packet on the outbound side, let it
egress and remember the state there" (possibly limited to a set of interfaces
(Timo has a Perl script to sort of simulate that)
2. On the egress side, make it possible to match "this packet passed on the
inbound side", possibly limited to a set of interfaces.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Manuel Bouyer

2019-02-17 12:52:25 UTC

Permalink

Post by Edgar FuÃ
A Timo knows, I'm running NetBSD in production.
I run a "one VLAN per IP range" (minus external, of course) policy.
I'm using packet filtering (currently ipf on 6.1) both on individual servers
(anti-spoofing, access restriction to certain deamon ports) and on the gateway
(the only machine with IP forwarding enabled) to restrict inter-network
traffic. From the ipf bugs I run into, I conclude I'm the only person on
the planet doing this.

No, I'm doing it too, but maybe with a different set of rules than you.
I don't use statefull filtering for TCP, for example.

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Timo Buhrmester

2019-02-17 18:36:40 UTC

Permalink

Post by Mindaugas Rasiukevicius
- NPF connection state is generally per-interface, but see below. Bypassing
the ruleset on other interfaces can have security implications, e.g. a packet
with a spoofed IP address might bypass ingress filtering. Hence the design
decision to default to such behaviour (so you control what's happening on
other interfaces with a ruleset there).

I actually like the per-interface state for various reasons including the one
you mentioned. However it does come with the downside of rule multiplication.

Since with my last patch (including ifid in connkey) I have something that
works the way I intend and it's "in production" now, here's a bit of syntactic
inspiration as to how the rule multiplication could be countered:

Basically when writing my npf.conf I pretend 'egress <interface list>' is a
valid construct so my rules look like this:

| pass stateful in on wm1 egress pppoe0,wm2 final proto tcp from $foo to $bar

and a perl script will generate from that:

| pass stateful in on wm1 final proto tcp from $foo to $bar
| pass stateful out on pppoe0 final proto tcp from $foo to $bar
| pass stateful out on wm2 final proto tcp from $foo to $bar

(and sort it in the right groups).

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Timo Buhrmester

2020-04-05 02:32:32 UTC

Permalink

Hi, sorry for the late reply, I haven't had time to upgrade to netbsd-9.

Does the "stateful-all" keyword (in -current/netbsd-9) satisfy your use case?

The short answer is no, or rather I don't know; something with the NAT seems broken.

I've built a test setup with yesterday's -current with three machines invovled:

Machine "client" has one interface (eth0, 192.168.3.2/24).
It will try to connect to 5.9.82.75:25/tcp via "npfbox"

Machine "npfbox" has two interfaces (vr1, 192.168.3.1/24 and vr0, 192.168.1.200/24)
It will perform NAT 192.168.3.0/24 to 192.168.1.200 and forward to 192.168.1.1

Machine "gateway" (192.168.1.1) is my internet gateway.

Here's the npf.conf on "npfbox"
| alg "icmp"
|
| map vr0 dynamic 192.168.3.0/24 -> 192.168.1.200
|
| procedure "logb" { log: npflog0 } #blocked
| procedure "logp" { log: npflog1 } #passed
|
| group "lo0" on lo0 {
| pass in final all apply "logp"
| pass out final all apply "logp"
| }
|
| group "internalnet" on vr1 {
| pass stateful-all in final family inet4 proto tcp from 192.168.3.0/24 to 5.9.82.75 port 25 apply "logp"
| }
|
| group default {
| block in final all apply "logb"
| block out final all apply "logb"
| }

Here's what currently happens, tcpdumping on all of npfbox's interfaces:
(note that npflog1 logs *passed* packets, also note no traffic at all on npflog0)

vr1: 04:00:03.913162 IP 192.168.3.2.53200 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
npflog1: 04:00:03.913232 IP 192.168.3.2.53200 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
npflog1: 04:00:03.913323 IP 192.168.1.200.1046 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
vr0: 04:00:03.913353 IP 192.168.1.200.1046 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
vr0: 04:00:03.936635 IP 5.9.82.75.25 > 192.168.1.200.1046: Flags [S.], seq 698708591, ack 4038765497, win 65535, options [mss 1432,nop,wscale 4,sackOK,TS val 1 ecr 1371479013], length 0
npflog1: 04:00:03.936683 IP 192.168.1.200.1046 > 192.168.1.200.1046: Flags [S.], seq 698708591, ack 4038765497, win 65535, options [mss 1432,nop,wscale 4,sackOK,TS val 1 ecr 1371479013], length 0
npflog1: 04:00:03.936756 IP 192.168.1.200.1046 > 192.168.1.200.1046: Flags [R], seq 4038765497, win 0, length 0
lo0: 04:00:03.936770 IP 192.168.1.200.1046 > 192.168.1.200.1046: Flags [R], seq 4038765497, win 0, length 0

So it seems that the "de-NATting" on the reverse path is broken.
I don't understand why the SYN/ACK doesn't show up on lo0, but I guess it doesn't matter much

Am I doing something wrong?

Timo

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas Rasiukevicius

2020-05-15 19:30:10 UTC

Permalink

Post by Timo Buhrmester
<...>
Here's the npf.conf on "npfbox"
| alg "icmp"
|
| map vr0 dynamic 192.168.3.0/24 -> 192.168.1.200
|
| procedure "logb" { log: npflog0 } #blocked
| procedure "logp" { log: npflog1 } #passed
|
| group "lo0" on lo0 {
| pass in final all apply "logp"
| pass out final all apply "logp"
| }
|
| group "internalnet" on vr1 {
| pass stateful-all in final family inet4 proto tcp from
| 192.168.3.0/24 to 5.9.82.75 port 25 apply "logp" }
|
| group default {
| block in final all apply "logb"
| block out final all apply "logb"
| }
(note that npflog1 logs *passed* packets, also note no traffic at all on npflog0)
<...>
So it seems that the "de-NATting" on the reverse path is broken.
I don't understand why the SYN/ACK doesn't show up on lo0, but I guess it
doesn't matter much

Just a general update: I have various NPF fixes and improvements which
will soon be merged to NetBSD.

On the 'stateful-all' problem: while the state will be picked up on the
other interface (vr0), the NAT policy will operate using the *initial*
connection direction which was established on vr1 as inbound. So, the
NAT mechanism doesn't recognize the SYN-ACK packet as returning/reverse.
Such behaviour is unhelpful and, instead, NPF should probably capture the
connection direction at the point of the NAT entry creation and perform
the translation based on that (rather than using the original connection
direction at the point of state creation).

There are more implications here.. I am going to add configuration-wide
parameters to give user more flexibility on connection state behaviour.

--
Mindaugas

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Timo Buhrmester

2020-04-14 02:07:50 UTC

Permalink

Post by Timo Buhrmester

Does the "stateful-all" keyword (in -current/netbsd-9) satisfy your use case?

The short answer is no, or rather I don't know; something with the NAT seems broken.

After some digging it seems that npf ties packet direction (in/out) to
stream direction (forwards/backwards), which naturally fails when
multiple interfaces are involved. Maybe I'm misunderstanding things,
but it fits the fact that the wrong address is being rewritten
(in the mentioned testcase, rewriting 5.9.82.75 > 192.168.1.200
to 192.168.1.200 > 192.168.1.200 rather than to 5.9.82.75 > 192.168.3.2.

Unrelatedly, I noticed that the order of groups in npf.conf matters.
That is, if the "default" group is the first group in the file,
the rules in the "default" group will apply to all packets regardless
of more specific groups below. This can be trivially worked around
by putting the default group last, of course, but the documentation
doesn't read as if this was intended behavior.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de