Discussion:
Identifying a slightly odd network problem
(too old to reply)
Jan Danielsson
2013-07-24 22:33:57 UTC
Permalink
Hello,

I need some help trying to even understand where to begin to look for
a solution for a network problem I've been having.

The network looks something like this

ISP
^
|
v
ZyXEL modem
in bridged mode
^
|
v wm0
Soekris net6501
running NetBSD/amd64
netbsd-6 branch
^ wm1 ^ wm2 ^ wm3
| | |
----- | ------
| | |
switch switch WiFi
||| ||| Access
Computers Media Point
Center

Some relevant rc.conf variables:

------------------------------------
auto_ifconfig=YES

ifconfig_wm1="inet 192.168.72.1 netmask 255.255.255.0"
ifconfig_wm2="inet 192.168.92.1 netmask 255.255.255.0"
ifconfig_wm3="inet 192.168.124.1 netmask 255.255.255.0"

dhclient=YES
dhclient_flags="wm0"

dhcpd=YES
dhcpd_flags="-q wm1 wm2 wm3"

ipfilter=YES
ipnat=YES
------------------------------------

I went from using the router-part of the ZyXEL modem, to bridging it
and letting the Soekris-box handle the higher-level stuff. When I set it
up, it Just Worked (well, more or less). Because I have a dynamic
address, my ipnat.conf and ipf.conf are generated from
ipnat.conf.template and ipf.conf.template which simply have "$EXT"
strings in them which I sed to the appropriate values.

Anywho -- I set everything up, and it worked like a charm for several
months. At times, I unplugged the system (when there's a risk of
thunderstorms/lightning), but I only had everything unplugged for a few
hours; until a few months back, when I had everything unplugged for two
days, which meant that when I booted the Soekris machine, it got a new
public ip address, so I ran the scripts to reconfigure ipf.conf and
ipnat.conf (and ran reload using the rc.d scripts). I got some really
odd network (nonfunctional) behavior, so I rebooted the soekris machine,
and then it appeared to work. However, over time I noticed a few quirks:

- I started having quite a few issues when establishing TCP
connections. Seemingly randomly chosen connections would simply never
really complete.
- Skype was really flakey (suddenly messages will simply be
"Pending", and can stay that way for up to five minutes or so before
being sent. People trying to send messages to me say they are seeing the
same thing.
- Whenever I was logged in to PSN on my PS3, it would automatically
log me out after a while (a few minutes to an hour or so).
- Somewhat often when I send a mail, Thunderbird will complain that
it can't save the the message to the Sent-folder (***@Google); retrying
a few times would make it succeed after a while.

I started thinking there was something wrong at my ISP until I
noticed another quirk:

- From my main computer (on the switch on wm1) when I ssh to my media
PC (on the switch on wm2), if I don't do anything with the ssh session
(like type in it, or have it update (like showing top, rtorrent or
something which keeps updating its display), the ssh session will simply
hang (become non-responsive), after a very short time (I'd say roughly
one minute or so).

I became pretty frustrated with the behavior, and just to see if
there was a quick fix, I simply shut everything down and restarted
again. That fixed all the problems. No more odd connection problems, no
more "Pending" messages in Skype, no more being signed out of PSN, no
more Thunderbird problems, and I could establish an ssh session from my
main PC to my media-PC, open up a prompt, then let it be opened for
hours, even days, without being touched, and then when I try using it,
it works (just as I would expect it to).

So I was naturally annoyed that I never got to the bottom of it, but
I was happy my network wasn't behaving weird any longer.

Fast-forward a few weeks, and we had another warning about potential
lightning/thunderstorms, so I unplugged everything, went away for a few
days, came back, plugged everything in, got a new IP-address from my
ISP, ran my scripts, restarted. Then the odd problems were back (odd
connection problems, Skype problems, save Sent-mail problems, PSN
signouts, ssh would hang). After a few days a friend of mine complained
about the "Pending" problem, so I decided to try the "quick fix" again
by simply shutting everything down and starting everything up again. But
this time it didn't work. And I've tried it again, just to be sure.

I should stress that the vast majority of things I do work. I can
browse the network (albeit with a few connections acting up as described
above), I can read mail (again, with a few connection problems). PSN is
the only thing which feels very ... er.. reliably unreliable.

Although I have very few samples, I'm pretty sure that all these
issues are related (like I said, I had none of them for a long time, all
of them appeared at the same time, they all went away at the same time,
and they all came back at the same time), and because of the
ssh-problem, I get a feeling it's a very local problem.

Oh, btw, the ssh problem occurs also when I ssh from a system on the
WiFi network to the media-PC, but it does not occur when I ssh to my
Soekris router (that connection never dies).

I'm at a loss. I've tried the very few things I can think of, like
checking the routes, making sure that the generated ipf.conf and
ipnat.conf look ok (and it all looks ok). And also:

# sysctl -a | grep forward
net.inet.ip.forwarding = 1
net.inet6.ip6.forwarding = 0


Any hints, tips, help is very welcome -- I'd like to figure out once
and for all what is causing this. I pretty much suck at network
administration, so if you're wondering if I tried running <useful
network tool X>, then it's a safe bet that I haven't, because I never
heard of it. :)
--
Kind regards,
Jan Danielsson
Mouse
2013-07-25 00:34:52 UTC
Permalink
Post by Jan Danielsson
I need some help trying to even understand where to begin to look for
a solution for a network problem I've been having.
The network looks something like this [...]
ipfilter=YES
ipnat=YES
The symptoms you describe sound to me like something going wonky in
ipf/ipnat state, such that it's discarding state inappropriately and
thus losing packets which would normally be forwarded - or, depending
on just what your ipf.conf and ipnat.conf look like, possibly just
filtering packets it shouldn't be. If you want to verify this, I'd
suggest starting tcpdumps on wm1 and wm2, then doing one of the
problematic ssh connections. Look to see if packets are coming in one
interface but not going out the other. (If it would help, I have a
program that can take two pcap files and merge them based on packet
timestamps....)

Personally I'd suggest putting ipf/ipnat on a different machine from
the house network routing - if you can, of course; if you're filtering
between house subnets or something, obviously that's not as doable.

If everything works fine with tcpdump running but not otherwise, that's
an even stronger clue, though I'm not sure in any detail what it points
to (especially since you're running a version I've never worked with).

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Swindells
2013-07-25 01:12:45 UTC
Permalink
Post by Jan Danielsson
I need some help trying to even understand where to begin to look for
a solution for a network problem I've been having.
Have you got the same MTU on all the interfaces ?

Robert Swindells


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Lars Schotte
2013-07-25 09:48:11 UTC
Permalink
On Thu, 25 Jul 2013 02:12:45 +0100 (BST)
Post by Robert Swindells
Have you got the same MTU on all the interfaces ?
Robert Swindells
keep in mind that Zyxel is a bullshit technology!

i would rather look at MSS CLAMP because i do not think that zyxel does it. i
rather think that they are relying on the operating system to take care when it
sends too big of a packet.

however, we do not even know if that's DSL, DOCSIS or whatever he is connecting
the Zyxel box to.
--
Lars Schotte @ hana.gusto
3.9.11-200.fc18.x86_64 x86_64 GNU/Linux
Claws Mail version 3.9.2

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Lloyd Parkes
2013-07-25 02:54:37 UTC
Permalink
Post by Jan Danielsson
- From my main computer (on the switch on wm1) when I ssh to my media
PC (on the switch on wm2), if I don't do anything with the ssh session
(like type in it, or have it update (like showing top, rtorrent or
something which keeps updating its display), the ssh session will simply
hang (become non-responsive), after a very short time (I'd say roughly
one minute or so).
Do you have any ipf/ipnat rules on wm1 and wm2? You probably shouldn't (but that depends on what you actually want).

Also, see /usr/share/examples/ipf/mediaone for an (old) example of how to configure an interface with a dynamic IP address. The recommendation in that example file is to use "ipf -y" in dhclient which forces a manual resync with various kernel interface lists.

It's been a very long time since I last had a dynamic IP address, but that media one example is where I would start planning my firewall.

Cheers,
Lloyd
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...