lib/42405: libc: getaddrinfo() should perform T_A lookups before T

Discussion:

lib/42405: libc: getaddrinfo() should perform T_A lookups before T_AAAA lookups, was: Resolver problems

(too old to reply)

Matthias Scheler

2009-12-03 23:26:58 UTC

It's entirely reasonable to make this configurable, although for the
original circumstance Ingolf describes, where "v6 support has
already been removed from my kernel -- no interface has a v6
address", it's hardly arbitrary for an IPv4-only system to try A
lookups before hoping that an AAAA lookup

But that is not what your patch does unless I missunderstood it.
Your patch will unconditionally prefer A records on a all system
no matter whether the kernel supports IPv6 or not.

Anyway, by all means, please submit a better solution and update
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=42405 accordingly. :-)

I could argue that as you want to change behaviour of core component
after many years that it is your responsibility to make that behaviour
configurable to maintain backwards compatibility.

Kind regards

--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Robert Elz

2009-12-04 05:37:43 UTC

Permalink

Date: Thu, 03 Dec 2009 15:13:28 -0800
From: Chuck Swiger <***@mac.com>
Message-ID: <42608C26-262F-4D8D-A726-***@mac.com>

| Yes, it might. I've yet to see a situation where that actually
| was the case, however, but it's conceivable.

I certainly have, I haven't seen it in the past 6 months or so, but
before that, NAT table full was a daily occurrence (you stick a university
behind a single NAT router, and no matter how big the router's tables
are, the students will manage to fill them up...) On the other hand, IPv6
simply works, almost always - the vast majority of the e-mail I actually read
I receive via IPv6 (as distinct from the vast majority of e-mail I receive,
which comes via IPv4, and mostly goes straight to the bit bucket).

| A few years back, when I was [...] (this was in metro NYC area),

Yes, as I understand it, in much of the US, you're still pretty much
living in the past - outside the US, at least in some areas, things are
much much better (and it seems to be improving inside the US too, at
least a little.)

| For that matter, with 0.16% of routable networks IPv6-only,
| 5.29% dual-protocol capable, and 94.54% of IPv4-only, I don't feel it
| is arbitrary to choose to perform A before AAAA by default.

It depends whether you want to plan for the future or not - what you're
distributing now is likely to still be running 4, 5, or more, years from
now, and then the v4 first policy might start being backwards.

It sounds as if FreeBSD took the chicken way out - as in, we know we're
right, and what we are doing is OK, and the fault is with some other
site(s), but we cant fix them, we can only change us, and we look better
if we hide their problem, so let's do that...

Personally, I prefer to emphasise the faults, get complaints, direct the
complaints to where the problems really are (like the bogus router with
the broken DNS proxy that (re-)started this discussion, in the past day or so)
so that there's a better chance of getting the real problems fixed (after
all, the people with the ads want their ads seen, unlike some others who
simply don't care, they have an incentive to fix things, if they're informed
by enough people that they are broken).

On the other hand, I certainly agree that if an IPv6 address would not be
useful at all (if you're running a kernel with no IPv6 support) then it
is pointless asking for one (unless explicitly requested by the application),
and if others agree, I'd certainly be willing to make and submit a patch
that would turn off the AAAA query when the running kernel doesn't support
using the returned address (that's not all that difficult to achieve).

The same patch, would, of course, turn off the default A query if the
running kernel doesn't support IPv4 (again, if the application doesn't
explicitly request it). That's not likely to be noticed by anyone in the
immediate future, but a little further away, it might help a bit, and it
is also the right thing to do.

I also have patches to the kernel to allow protocol support to be dynamically
turned on/off, so it isn't necessary to recompile the kernel to disable
IPv6 (or IPv4, or in theory anything else, but no-one really cares about
the others any more anyway.) Naturally, you can't enable a protocol that
way if it wasn't compiled into the kernel initially (this isn't LKM's
or anything...) but it is trivial to disable one, or re-enable it again.

A few applications have problems with the and need application config
updates (sendmail likes to bitch if it is configured to use IPv6, and
then can't get an IPv6 socket, for example) but most stuff just works
fine (if the protocol is disabled after an application has grabbed sockets
using it, it is optional whether those sockets get to continue working,
or whether they all just hang - as in no more data in/out - that is the
normal operation is just to prevent creating of a socket in the disabled
protocol, but let everything else operate normally - but it is also possible
to block packet sending/reception, and if I remember correctly, to block
packets except those on established TCP connections, so an incoming SYN
would be blocked, but existing connections get to run to completion - and
for blocked packets, whether they're treated as "this protocol is unknown
here" (ie: the incoming packet is simply discarded) or "this protocol is
known but no-one wants it" (ie: a TCP RST, or ICMP port unreachable, as
appropriate, is returned).

Full support for this is only for the IP protocols, disabling something like
appletalk would prevent a socket being created, but none of the other hooks
are in the code I have to do more than that (I don't even recall if anything
got added to create the sysctl nodes for protocols other than IP to allow
the disable to happen - though fixing that would be trivial, or trivial
multiplied by N protocols.)

If this stuff looks useful, I'll find it, and upgrade it to current current
(it was done a couple of years ago). and send it someplace...

kre

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Matthias Scheler

2009-12-04 08:00:21 UTC

Permalink

I don't think that we should default to anything 4 over 6.

Despite being a big IPv6 fan I'm not sure about this anymore. With the
current default IPv6 gets frequently incorrectly blamed for network
problems caused by something else, e.g. broken DNS forwarders in
broadband routers.

The result is that people fall victim to the IPv6 FUD and compile
NetBSD without IPv6 support which makes a later migration even harder.

But now matter what the default should be the behaviour *must* be
configurable.

ipv4 is a dead horse and
it will soon silently disappear like floppy drives and VHS. :)

I agree. But apparently some people like riding dead horses because
they don't balk. ;-)

Kind regards

--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Joerg Sonnenberger

2009-12-04 13:51:29 UTC

Permalink

As for the problems, are they the result of a host getting a non-working
v6 address on a broken subnet? I have been running dual v4/v6 setups in
many places for years, and had AAAA records for some of my machines, and
the only problems I've had have been when a local v6 router advertises a
prefix but connectivity is not there.

Same here and even then it is only a problem with
net.inet6.ip6.accept_rtadv = 1.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Matthias Scheler

2009-12-04 14:01:22 UTC

Permalink

As for the problems, are they the result of a host getting a non-working
v6 address on a broken subnet?

No, the problem is caused by a router whose builtin caching nameserver
sends broken replies if NetBSD's libc queries for AAAA records. The broken
answer gets ignored and NetBSD's libc waits 15(?) seconds before trying
to query for A records.

Kind regards

--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

der Mouse

2009-12-04 14:05:04 UTC

Permalink

Post by Matthias Scheler
No, the problem is caused by a router whose builtin caching
nameserver sends broken replies if NetBSD's libc queries for AAAA
records. The broken answer gets ignored and NetBSD's libc waits
15(?) seconds before trying to query for A records.

How would asking for A first help? It'd still block the same way when
it gets around to asking for AAAA; the getaddrinfo() interface does not
have any sort of "here's an address but there may be more coming along
later" semantic available (nor can it, given its synchronous design).
To paper over this you'd have to not ask for AAAA at all.

Actually...is the broken response identifable as coming from this
particular bustification? If so, perhaps we could recognize it and
immediately stop trying to look up AAAA? (Though arguably that's
wrong; I'm with kre on the "make brokenness blatantly identifable as
such" stance, so I'm rarely in favour of papering over brokenness
elsewhere.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Robert Elz

2009-12-05 10:44:43 UTC

Permalink

Date: Sat, 5 Dec 2009 09:10:07 +0000 (UTC)
From: ***@serpens.de (Michael van Elst)
Message-ID: <hfd813$7m4$***@serpens.de>

| With the same argument you could say that a 'nslookup' or 'dig'
| should fail to query AAAA records when your kernel cannot handle
| IPv6.

Not quite - no-one would argue that if you asked for a particular addr type
that getaddrinfo() should do anything different than what it was asked (not
that dig (etc) use getaddrinfo() anyway), but if you say "just give me
something I can use" without caring what protocol, I don't think it would
be unreasonable for getaddrinfo() to simply return only addresses that
have at least the potential to work - and I don't think applications should
need to set some flag to tell getaddrinfo() not to return useless trash,
if anything, for the one in a hundred application that isn't a diagnostic
tool (which wouldn't be using getaddrinfo()), doesn't care what address type,
but wants everything available, even if it cannot possibly work, a flag to
indicate that would make more sense (you'd really have to hunt to find an
application that you'd need to modify to set it).

| getaddrinfo() is only used by programs that can handle IPv6

Typically yes, but not necessarily, it is really just a newer, cleaner,
interface to the resolver low level routines, and ought to be used by
anything new (or just being overhauled) these days, of course, such things
should be able to handle IPv6. but even if they can't, getaddrinfo() is a
much nicer interface than gethostbyname().

| It is a problem for people with a broken IPv6 setup. I.e. the machine
| is configured for an IPv6 network but the network doesn't function
| because of things like unstable tunnels or unreliable ipv6 providers.

Yes, it is (or should be) a real problem only in those cases, but it is
a waste of network resources in others - just asking for a v6 address
you're never going to use is a packet out, a packet back, and the back
end resolver being made to do (and then cache) data that's useless.
It shouldn't be a problem, but it is pretty stupid and unnecessary.

And of course the same is true for v4 for (future) systems where v4
doesn't work.

| I don't think that such a workaround should depend on the kernel
| because the need for the workaround has little to do with the kernel
| and the functionality that needs to be adjusted is embedded in
| libc.

Yes, it is libc that should act differently here.

| So please, add a knob to control libc behaviour.

For this, I'd do that, but I'd also not query for addresses that we
can tell cannot possibly be useful, if the application hasn't expressly
asked for that address type (either by name, which is an interface that
already exists, or by explicitly asking for useless answers, using some
new interface, if one gets invented).

| - sysctl is out of the question. It controls kernel behaviour and
| the kernel shouldn't be a storage for configuration data.

If you mean a sysctl whose purpose was to config libc, then certainly
that would be absurd. But that doesn't mean that libc cannot look at
a sysctl that has some other purpose, to determine whether a protocol is
supported or not (or something.) Not that I'd do that that way either,
it's cheaper and easier just to (try to) create a socket of the
appropriate type, if that succeeds, then return addresses of that type,
if it fails, don't (by default). That also works on any kernel.

| - an environment variable would be sufficient for me but there
| are issues to make this a system wide setting. There are always
| other parties that tweak the environment and break such a setup.

Agreed, env vars should be reserved for users to start personal
preferences, not for system settings.

| - resolv.conf is not my first choice, because this is used by
| the resolver code, not the high level functions like getaddrinfo().

Here I disagree (mildly at least) - I consider getaddrinfo() (and
gethostbyname()) to just be the programmer friendly interfaces to the
resolver - they're part of it. So, config in resolv.conf is not
at all unreasonable (and the relevant function already accesses the
resolver state, so using that just means moving a function call a
little earlier).

| But since there already is a bit that controls the resolver library
| in a similar way, this might be an option. The particular bit
| (RES_USE_INET6) however, has a completely different meaning
| because it is supposed to return IPv4 adresses as IPv6 adresses
| to simplify application code.

That's not the right bit to use (even though I couldn't find any place
NetBSD's code actually uses it - in libc anyway) but the analogy is a
reasonable one. And we have practice of other systems.

| - nsswitch.conf would be a natural choice if it allowed to pass
| options to the backends (e.g. hosts: files dns[v4only]). But
| who dares to touch nsswitch code? :)

Well, maybe, I guess we could allow dns4only and dns6only as alternatives
to dns in there, but that looks to be a bit of a stretch.

| - a new file/symlink like /etc/malloc.conf (which BTW is the system
| wide configuration for something controlled by an environment
| variable).

If we didn't already have resolv.conf, that would be my choice.

| This would help with the broken DNS server only when you also
| create a kernel without IPv6 (so far we cannot disable IPv6
| link local addresses).

Ingolf already did the former, and it didn't help - but not sending
a query for v6 addresses would have helped, regardless of whether his
kernel had v6 support in it or not. And since there seemed to be at
least some support, and no opposition, I'll dig out the CD I have somewhere
with the "disable protocol" code on it, and make it fit current as it
is now, and send in a change request PR (probably for the simple version
that just makes the selected protocol vanish to new users of it, rather
than all the extra that allows existing connections to be affected as well).

For most expected requirements, it does what is desired, you set the
sysctl in /etc/sysctl.conf and from the next boot, there is almost no
difference visible between running a system with the disabled protocol,
and running a kernel with the protocol compiled out (except that you
can "undisable" (yes, I know correct English is "enable" but that
doesn't quite give the same impression) the protocol any time.)

| So I propose /etc/addrinfo.conf with an associated environment
| variable ADDRINFO_CONF. And please paint it in pink.

I think resolv.conf is better, it is where resolver config goes, it
already has the ability to specify options, and is a reasonable home
for this kind of config.

I also don't think we need an env var at all, most applications already
have options that say whether they should use v4 or v6 (defaulting to whatever
works usually) and complicating that by adding yet another way wouldn't
help, just make everything messier, after all, what is telnet (or anything)
to do if you say "telnet -4 ..." when your ADDRINFO_CONF says "v6 only"
(or vice versa). Just leave that config to command line options - if an
application doesn't provide a way, then its author didn't intend his users
to be able to control this behaviour.

kre

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Michael van Elst

2009-12-05 12:20:59 UTC

Permalink

Post by Robert Elz
| getaddrinfo() is only used by programs that can handle IPv6
Typically yes, but not necessarily

To clarify this, a program that uses getaddrinfo() has to know
that not everything is IPv4 (and thus 'handle it') wether it
wants to use IPv6 connections or not. It either has to tell
getaddrinfo() to restrict itself to some answers or it has to
check the result for answers that it does understand.

Post by Robert Elz
Yes, it is (or should be) a real problem only in those cases, but it is
a waste of network resources in others - just asking for a v6 address
you're never going to use is a packet out, a packet back, and the back
end resolver being made to do (and then cache) data that's useless.
It shouldn't be a problem, but it is pretty stupid and unnecessary.

It is how the coexistence of IPv4 and IPv6 has been designed. The
alternative is to use a T_ANY query and filter the results.

Post by Robert Elz
| So please, add a knob to control libc behaviour.
For this, I'd do that, but I'd also not query for addresses that we
can tell cannot possibly be useful

How would you know? I can easily imagine a program analyzing
logfiles, resolving host names or addresses without using
these to create a network connection. The program would behave
differently wether you give the machine IPv6 connectivity or not.

As I said in my previous mail. This is about a workaround that
should be enabled by a decision. Magically deriving this from
something else is going to fail.

Post by Robert Elz
| - resolv.conf is not my first choice, because this is used by
| the resolver code, not the high level functions like getaddrinfo().
Here I disagree (mildly at least) - I consider getaddrinfo() (and
gethostbyname()) to just be the programmer friendly interfaces to the
resolver - they're part of it.

Neither regarding history of development nor the actual code.

resolver is about DNS queries, it doesn't deal with files, nis,
ldap or anything else and it uses resolv.conf as its configuration
file.

getaddrinfo() is about name and address translations, it may use
the resolver subsystem as one of its data sources and it uses
nsswitch.conf as its configuration file.

Post by Robert Elz
That's not the right bit to use (even though I couldn't find any place
NetBSD's code actually uses it - in libc anyway) but the analogy is a
reasonable one. And we have practice of other systems.

Do we?

Post by Robert Elz
| - nsswitch.conf would be a natural choice if it allowed to pass
| options to the backends (e.g. hosts: files dns[v4only]). But
| who dares to touch nsswitch code? :)
Well, maybe, I guess we could allow dns4only and dns6only as alternatives
to dns in there, but that looks to be a bit of a stretch.

nsswitch.conf determines who provides the name translations. Using
it to provide options on how the names are translated seems to be
logical. An argument against this is the fact that this would be
the first and so far only option.

Post by Robert Elz
| This would help with the broken DNS server only when you also
| create a kernel without IPv6 (so far we cannot disable IPv6
| link local addresses).
Ingolf already did the former, and it didn't help

The reasoning was: a special flag to getaddrinfo() would deduce
how names are translated from the network configuration, but
we need to rebuild the system for this. Rebuilding the kernel
alone obviously doesn't help, but it shouldn't be necessary
to control some other aspect of the system.

If you can make this a boot option instead this might be
useful to some, but see above that creating an automatic
dependency between getaddrinfo() behaviour and kernel
configuration is still questionable at least.

Following that logic the next might be to filter out
DNS answers that point to addresses that have no routing
table entry.

Post by Robert Elz
| So I propose /etc/addrinfo.conf with an associated environment
| variable ADDRINFO_CONF. And please paint it in pink.
I think resolv.conf is better, it is where resolver config goes, it
already has the ability to specify options, and is a reasonable home
for this kind of config.

Tell that the person who has to augment the parser and data structures
of the DNS resolver library for things controlling getaddrinfo().

Post by Robert Elz
I also don't think we need an env var at all,

Very few people will consider it, it's more a debug help (like
MALLOC_OPTIONS) that can be used to override libc behaviour per
process.

I don't say we need it, but just took malloc as a model for how
to control a libc function. If we use this model then we might
want all of it for consistency.

Post by Robert Elz
most applications already
have options that say whether they should use v4 or v6 (defaulting to whatever
works usually) and complicating that by adding yet another way wouldn't
help, just make everything messier, after all, what is telnet (or anything)
to do if you say "telnet -4 ..." when your ADDRINFO_CONF says "v6 only"
(or vice versa).

It will behave correctly. Just like telnet -4 to a v6 only host:

henery% telnet -4 pussyfoot.ipv6.local
pussyfoot.ipv6.local: No address associated with hostname

Greetings,

--
Michael van Elst
Internet: ***@serpens.de
"A potential Snark may lurk in every tree."

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

der Mouse

2009-12-05 21:18:31 UTC

Permalink

[B]ut if you say "just give me something I can use" without caring
what protocol, I don't think it would be unreasonable for
getaddrinfo() to simply return only addresses that have at least the
potential to work - and I don't think applications should need to set
some flag to tell getaddrinfo() not to return useless trash,

The problem is, what constitutes "useless trash"?

if anything, for the one in a hundred application that isn't a
diagnostic tool (which wouldn't be using getaddrinfo()),

Where do you get this idea that diagnostic tools don't use
getaddrinfo()? Especially when trying to diagnose name resolution
issues, one of the things to do is to try the usual interface.

doesn't care what address type, but wants everything available, even
if it cannot possibly work,

"Cannot possibly work" is impossible for software to tell.

a flag to indicate that would make more sense (you'd really have to
hunt to find an application that you'd need to modify to set it).

I wouldn't have to hunt at all; I have an example at ready hand.

One of my programs (I call it addr) simply prints out the list of
addresses a name resolves to. There is no implication that any of
those addresses are going to be used to aim network traffic; they
might, for example, be going into a router blocking list. And there
most certainly is no implication that whatever use is to be made of the
addresses will be done on the machine where addr is running. As such,
software - addr, getaddrinfo(), whatever - is not in a position to tell
what might or might not be useful to return.

I use addr regularly. I wrote it because I couldn't find any other
application that could turn a name into a list of addresses without a
lot of extracting addresses from undocumented (and usually non-frozen
and thus liable to breakage upon upgrade) output formats. I initially
meant it for use in shell scripts, but most of the times I run it turn
out to be command-line queries.

Post by Robert Elz
getaddrinfo() is only used by programs that can handle IPv6

Typically yes, but not necessarily, [...], getaddrinfo() is a much
nicer interface than gethostbyname().

In many respects. It certainly has its problems, perhaps most notably
the inconvenience of initializing hints structures (I can't see any
reason for demanding that the unused fields be set to fixed values,
and, since some of them are pointers, a wholesale bzero() is not
suitable). But most of the problems I run into when using it are not
ones that it is in a position to solve.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Greg A. Woods

2009-12-07 07:15:40 UTC

Permalink

At Sat, 5 Dec 2009 16:18:31 -0500 (EST), der Mouse <***@Rodents-Montreal.ORG> wrote:
Subject: Re: lib/42405: libc: getaddrinfo() should perform T_A lookups before T_AAAA lookups, was: Resolver problems

Post by der Mouse
One of my programs (I call it addr) simply prints out the list of
addresses a name resolves to. There is no implication that any of
those addresses are going to be used to aim network traffic;

Interesting word, that word "addresses", especially in this context.

Once upon a time the ubiquity of IPv4 addresses was so firmly entrenched
that all we had was gethostbyname(3) and it firmly used only AF_INET
internally with no option for anything else but yet we were all happy
and blissful.

Perhaps it was a mistake to jam IPv6 hostnames into the DNS in such a
way that we ended up with two different record types with the exact same
purpose.

Perhaps instead we should have created a new protocol family type in the
DNS (eg. IN6) and used that instead of "IN" for all IPv6-related records.

Or maybe we should have added a new query type to the DNS that could be
used to ask for all available address-type records, though this would be
seem to be inventing an ad-hoc record classification scheme which is not
clearly defined in the DNS protocol.

You'll note from the source that getaddrinfo(3) doesn't send queries for
T_ANY records -- when it is not given any desired address family it has
to decide which type of address record to ask for (first). That's
essentially the problem which started this thread -- along with local
system conditions which cause the initially queried type to fail with a
timeout. This combination will cause consternation for users no matter
what the default search order is.

Unfortunately there isn't the DNS query I suggested which will return
all types of address records suitable for any address family and I
suspect that using T_ANY will further cause spurious failures of various
kinds when there are too many addresses to fit in a UDP reply (though
the calling code should be smart enough to retry with TCP when
necessary).

Even on a system where timeouts do not cause problems, there is
potential for unnecessary overhead should the unusable type be queried
for first.

Personally I think the likes of your "addr" tool should be required to
iterate through all relevant "valid" address families when it does its
query using getaddrinfo(3), and that AF_UNSPEC not be a valid hint. By
iterating over all AF_* values your code might then eventually request
all address records suitable for all address families, regardless of
whether any or all of those addresses are suitable for use in forming a
connection from the host running the program. Unfortunately the
apparent design of the getaddrinfo(3) API defers the decision making
about which address family might "work" until after the overhead and
potential delays of requesting records for a non-"working" address
family have been incurred.

Even with AF_* iteration your code would be at the mercy of the library
implementation since there is no guarantee that any given implementation
of these APIs will support all commonly used address families.

So, for your code to be portable and useful in any environment perhaps
it should avoid using high-level name resolution APIs and delve directly
into the desired sources using the lowest reasonable APIs for the
necessary protocols or databases.

--
Greg A. Woods
Planix, Inc.

<***@planix.com> +1 416 218 0099 http://www.planix.com/

Michael van Elst

2009-12-07 08:15:07 UTC

Permalink

Post by Greg A. Woods
Unfortunately the
apparent design of the getaddrinfo(3) API defers the decision making
about which address family might "work" until after the overhead and
potential delays of requesting records for a non-"working" address
family have been incurred.

Nobody notices the "overhead" unless looking for it. That's because
people did a careful decision.

Almost nobody notices "potential delays", you need a specific bug
in your DNS server that you are unable or unwilling to correct.

So, all this arguing on your side is only for people that do not want
to use IPv6, and that insist on a single knob to turn off the evil so
they can live in a perfect world(TM). This becomes even more visible
as you start suggesting to change thew world (e.g. Mouse' addr tool)
to fit your world view.

Don't you see that such an argumentation is futile?

--
--
Michael van Elst
Internet: ***@serpens.de
"A potential Snark may lurk in every tree."

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Joerg Sonnenberger

2009-12-07 15:27:01 UTC

Permalink

Post by Michael van Elst
Almost nobody notices "potential delays", you need a specific bug
in your DNS server that you are unable or unwilling to correct.

Amen. I still find it surprising how much people are willing to require
workarounds for broken server. Heck, it is likely much easier to just
run a working resolver, that can also do dnssec etc...

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

der Mouse

2009-12-07 08:01:26 UTC

Permalink

Post by Greg A. Woods

Interesting word, that word "addresses", especially in this context.

Indeed.

Post by Greg A. Woods
Once upon a time the ubiquity of IPv4 addresses was so firmly
entrenched that all we had was gethostbyname(3) and it firmly used
only AF_INET internally with no option for anything else but yet we
were all happy and blissful.

...until we happened to, say, use Ultrix, and had to deal with
AF_DECNET. (Or did they spell it AF_DECnet? I forget; it's been too
long.) I dealt with DECnet sockets, back in the days when IP
connectivity was a rare thing in Canada, building IP-over-DECnet
tunnels among the Montreal universities - but it was too long ago for
me to remember details, and I doubt I still have the code (I've started
a find looking for it, but it'll take a while).

Post by Greg A. Woods
You'll note from the source that getaddrinfo(3) doesn't send queries
for T_ANY records --

True. The semantics are wrong. In particular, if some cache along the
way happens to contain some record for the node in question, T_ANY will
get that data and nothing else, even if other records actually do exist
in the authoritative data for that node. (This can happen if the cache
in question was previously queried for a specific record type, like T_A
or T_MX.)

Post by Greg A. Woods
Unfortunately there isn't the DNS query I suggested which will return
all types of address records suitable for any address family

...probably in part because it would have much the same problem as
T_ANY, in its interaction with caches.

Post by Greg A. Woods
Personally I think the likes of your "addr" tool should be required
to iterate through all relevant "valid" address families when it does
its query using getaddrinfo(3), and that AF_UNSPEC not be a valid
hint.

But this then renders it incapable of handling address families it
doesn't specifically know about, eg when built on a system with an
address family not yet written into the code. Between getaddrinfo()
and inet_ntop (which really should not have the "inet_" prefix;
consider AF_BLUETOOTH or AF_DECNET, neither of which is an "inet"
protocol but each of which could and arguably should be supported by
inet_ntop or something like it), there is no reason addr needs to have
any AF-specific code in it at all. I think losing that would be a
significant regression.

If getaddrinfo() supported some kind of "bitmask of address families of
interest" rather than the current "either one family or everything"
design, this might be more palatable.

Post by Greg A. Woods
So, for your code to be portable and useful in any environment
perhaps it should avoid using high-level name resolution APIs and
delve directly into the desired sources using the lowest reasonable
APIs for the necessary protocols or databases.

Perhaps. Indeed, from one point of view it already does - just for
values of "lowest reasonable" you disagree with.

Hmm, I should check what interface addr actually does use...it's been a
long time since I wrote it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Robert Elz

2009-12-07 08:33:11 UTC

Permalink

Date: Mon, 07 Dec 2009 02:15:40 -0500
From: "Greg A. Woods" <***@planix.ca>
Message-ID: <m1NHXot-***@most.weird.com>

| Perhaps instead we should have created a new protocol family type in the
| DNS (eg. IN6) and used that instead of "IN" for all IPv6-related records.

No, DNS classes are an ill-defined botch, it isn't clear what they are
for, or how they're intended to work, originally anyway, it is clear
that they don't work as things are implemented.

And there are already a host of different address type resource records
defined (SNAP, X.25, ...) which co-exist with IPv4 (perhaps mostly
as no-one has ever bothered with them to any extent anyone notices),
but there's no reason for IPv6 to be diffeent.

| Or maybe we should have added a new query type to the DNS that could be
| used to ask for all available address-type records, though this would be
| seem to be inventing an ad-hoc record classification scheme which is not
| clearly defined in the DNS protocol.

That has been considered, very seriously, the classification problem
doesn't really exist, but it has the same kinds of problems as ...

| You'll note from the source that getaddrinfo(3) doesn't send queries for
| T_ANY records

because it doesn't work, ANY queries don't operate the way you'd think
they might at first glance, and they're useless for reliable data fetching
(on the other hand, they're great for debugging cache problems).

sendmail tried using ANY queries (to fetch MX and A, or whichever of
those was available, rather than AAAA and A, but the DNS doesn't care
much about one RR type compared to another - or not in the answer section)
which failed miserably in certain cases - sendmail no longer does that,
nor does anything else written by anyone who understands the DNS.

| Unfortunately there isn't the DNS query I suggested which will return
| all types of address records suitable for any address family and I
| suspect that using T_ANY will further cause spurious failures of various
| kinds when there are too many addresses to fit in a UDP reply (though
| the calling code should be smart enough to retry with TCP when
| necessary).

The reply size is not the problem, but spurious failures certainly are.

| Personally I think the likes of your "addr" tool should be required to
| iterate through all relevant "valid" address families when it does its
| query using getaddrinfo(3), and that AF_UNSPEC not be a valid hint.

A good "addr" tool would look for NSAP, X.25, and other "address" RR types as
well, probably. I certainly agree that any kind of diagnostic of
information gathering tool should not just be using getaddrinfo() and
assuming it will necessarily return everything.

kre

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Michael van Elst

2009-12-05 12:32:50 UTC

Permalink

According to POSIX, getaddrinfo() "shall return a set of socket
addresses and associated information to be used in creating a socket
with which to address the specified service." What is the use of
delivering socket addresses which will certainly fail later when used
as intended by this specification?

You don't know how the adresses are used and, at the time of the
getaddrinfo() call, you don't know wether you can use these addresses
for a connection. Even more, socket(), bind() and connect() will happily
tell you when they do not support the requested family or when
they cannot connect to the requested address and you can and
should continue by trying the next address. This is how
you should handle the answer from getaddrinfo(), independent
of IPv4 or IPv6.

b) avoid run-time errors by having getaddrinfo() deliver only v4
addresses on systems which do not support v6 (unless, of course, the
user explicitly requests v6 addresses)

See my answer to kre, that dependency isn't necessarily valid and

c) if both v4 and v6 are supported, make the decision on selection /
ordering configurable.

if you already want a separate configuration as a workaround (like I
suggested) then this will allow you to do the same.

Greetings,

Greg Troxel

2009-12-04 15:19:36 UTC

Permalink

Post by Matthias Scheler
No, the problem is caused by a router whose builtin caching nameserver
sends broken replies if NetBSD's libc queries for AAAA records. The broken
answer gets ignored and NetBSD's libc waits 15(?) seconds before trying
to query for A records.

I see. Well, perhaps some kind of "don't do AAAA lookups" runtime
config would help such people. Or a switch to accept replies from the
wrong port.

I wonder how many people are suffering from this - it's not clear that
it's sensible for people who aren't having the problem to spend time
working on it, compared to replacing a few broken routers.

Matthias Scheler

2009-12-04 15:45:16 UTC

Permalink

Post by Greg Troxel
I see. Well, perhaps some kind of "don't do AAAA lookups" runtime
config would help such people.

Yes, that is the solution which is currently discussed. Ideally
wand the following settings:

Query AAAA first, then A
Query A first, than AAAA
Query only AAAA
Query only A

Post by Greg Troxel
I wonder how many people are suffering from this ...

I don't know that. But considering that some people get nervous when
ftp(1) reports that IPv6 doesn't work before it tries IPv4 it
might be good to have an option to look up A records first.

Kind regards

--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Hajimu UMEMOTO

2009-12-04 16:57:32 UTC

Permalink

Hi,

Post by Greg Troxel

On Fri, 4 Dec 2009 15:45:16 +0000

I see. Well, perhaps some kind of "don't do AAAA lookups" runtime
config would help such people.

tron> Yes, that is the solution which is currently discussed. Ideally
tron> wand the following settings:

tron> Query AAAA first, then A
tron> Query A first, than AAAA
tron> Query only AAAA
tron> Query only A

Post by Greg Troxel
I wonder how many people are suffering from this ...

tron> I don't know that. But considering that some people get nervous when
tron> ftp(1) reports that IPv6 doesn't work before it tries IPv4 it
tron> might be good to have an option to look up A records first.

The getaddrinfo(3) queries A before AAAA on FreeBSD. But,
getaddrinfo(3) still returns AAAA 1st by default.
You may want to see the following diff:

http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/net/getaddrinfo.c.diff?r1=1.53;r2=1.54

Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
***@mahoroba.org ume@{,jp.}FreeBSD.org
http://www.imasy.org/~ume/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Matthias Scheler

2009-12-04 17:26:54 UTC

Permalink

Post by Hajimu UMEMOTO
The getaddrinfo(3) queries A before AAAA on FreeBSD. But,
getaddrinfo(3) still returns AAAA 1st by default.
http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/net/getaddrinfo.c.diff?r1=1.53;r2=1.54

I'm sorry but I don't understand how would this help in this case? The
problem was that the AAAA query resulted in a timeout because of the
broken DNS server of the router. The FreeBSD solution would still be
affected by that timeout because it does send an AAAA query after all.

Kind regards

--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Hajimu UMEMOTO

2009-12-04 17:56:01 UTC

Permalink

Hi,

On Fri, 4 Dec 2009 17:26:54 +0000

tron> I'm sorry but I don't understand how would this help in this case? The
tron> problem was that the AAAA query resulted in a timeout because of the
tron> broken DNS server of the router. The FreeBSD solution would still be
tron> affected by that timeout because it does send an AAAA query after all.

Ah, I might not understand the problem correctly. You are correct.
It helps for SERVFAIL against AAAA query in some case. But, it
doesn't help for the timeout against AAAA query.
The KAME's getaddrinfo(3) has the workaround for this issue. Query A
1st then query AAAA. And, if got A, query AAAA with shorten timeout.
AFAIK, Windows Vista does this manner as well.

Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
***@mahoroba.org ume@{,jp.}FreeBSD.org
http://www.imasy.org/~ume/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Henning Brauer

2009-12-04 17:00:27 UTC

Permalink

Post by Matthias Scheler

Post by Greg Troxel
I see. Well, perhaps some kind of "don't do AAAA lookups" runtime
config would help such people.

Yes, that is the solution which is currently discussed. Ideally
Query AAAA first, then A
Query A first, than AAAA
Query only AAAA
Query only A

fwiw, we have that in openbsd. from resolv.conf(5):

family Specify which type of Internet protocol family to prefer, if
a host is reachable using different address families. By de-
fault IPv4 addresses are queried first, and then IPv6 ad-
dresses. The syntax is:

family family1 [family2]

A maximum of two families can be specified, where family can
be any of:

inet4 IPv4 queries.
inet6 IPv6 queries.

--
Henning Brauer, ***@bsws.de, ***@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Hajimu UMEMOTO

2009-12-04 18:03:45 UTC

Permalink

Hi,

Sat, 05 Dec 2009 02:56:01 +0900 $B$N9o$K!V(Bume$B!W!"$9$J$o$A(B

ume> Hi,

On Fri, 4 Dec 2009 17:26:54 +0000

tron> I'm sorry but I don't understand how would this help in this case? The
tron> problem was that the AAAA query resulted in a timeout because of the
tron> broken DNS server of the router. The FreeBSD solution would still be
tron> affected by that timeout because it does send an AAAA query after all.

ume> Ah, I might not understand the problem correctly. You are correct.
ume> It helps for SERVFAIL against AAAA query in some case. But, it
ume> doesn't help for the timeout against AAAA query.
ume> The KAME's getaddrinfo(3) has the workaround for this issue. Query A
ume> 1st then query AAAA. And, if got A, query AAAA with shorten timeout.
ume> AFAIK, Windows Vista does this manner as well.

Oops. s/SERVFAIL/NXDOMAIN/

Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
***@mahoroba.org ume@{,jp.}FreeBSD.org
http://www.imasy.org/~ume/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Joerg Sonnenberger

2009-12-04 15:46:12 UTC

Permalink

Post by Matthias Scheler

As for the problems, are they the result of a host getting a non-working
v6 address on a broken subnet?

Frankly, that sounds more like a "use a proper DNS server"...

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

der Mouse

2009-12-04 19:33:38 UTC

Permalink

"* 0.238% of users have useful IPv6 connectivity (and prefer IPv6)
* 0.09% of users have broken IPv6 connectivity [...]"
Approximately 2 or 3 users per thousand prefer IPv6, and
approximately 38% of these IPv6 users have broken connectivity,

Um...what reason do you have to think ("of these") that the 0.09% is a
subset of the 0.238%? Indeed, given the contrast between "useful" and
"broken", I'd read those as being disjoint, absent indication to the
contrary. (I'd be interested in the percentage of users who have
useful v6 but prefer v4, though I suspect it's difficult to determine.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

der Mouse

2009-12-04 19:44:02 UTC

Permalink

IMO, all resolving activity should be done through mdnsd, [...]

IMO, requiring a daemon to be running for name resolution to work is
broken.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Thor Lancelot Simon

2009-12-04 19:49:46 UTC

Permalink

Maybe we should ask for AAAA and A in parallel. It would also be nice
if all of this was transparent to libc. IMO, all resolving activity
should be done through mdnsd, and we can make policy tweaks there.

I think this is a very, very bad idea, because it is so inherently fragile.
To my knowledge, the first Unix system to take this approach was Irix.
I know of a number of sites which actually switched away from Irix in the
ensuing years because, of course, any problem with the name service caching
daemon brought the system quickly into an unusable state -- and there were
plenty of problems.

Similarly, I've seen nameservice cache daemon problems, at one point or
another, cause major issues with every version of OS X I've used (which is
all of them).

This is a valuable feature but in my opinion it should not be the default
and it should certainly not be the only way libc can do it.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Chuck Swiger

2009-12-04 21:51:40 UTC

Permalink

Um...what reason do you have to think ("of these") that the 0.09% is a subset of the 0.238%?

Google discusses their testing methodology earlier in the PDF. Evidently they use signed JSON/AJAX background queries against two hosts called ipv4.ipv6-exp.l.google.com or dualstack.ipv6-exp.l.google.com to obtain data about IPv6 support; both hostnames have A records; only the latter has AAAA records.

Indeed, given the contrast between "useful" and "broken", I'd read those as being disjoint, absent indication to the
contrary. (I'd be interested in the percentage of users who have useful v6 but prefer v4, though I suspect it's difficult to determine.)
The KAME's getaddrinfo(3) has the workaround for this issue. Query A
1st then query AAAA. And, if got A, query AAAA with shorten timeout.
AFAIK, Windows Vista does this manner as well.

...is also pretty close to what MacOSX does in Libinfo's _mdns_query_mDNSResponder(), which issues both A and AAAA queries in parallel for AF_UNSPEC case:

// Timeout Logic
// The kevent(2) API timeout parameter is used to enforce the total
// timeout of the DNS query. Each iteration recalculates the relative
// timeout based on the desired end time (total timeout from origin).
//
// In order to workaround some DNS configurations that do not return
// responses for AAAA queries, parallel queries modify the total
// timeout upon receipt of the first response. The new total timeout is
// set to an effective value of 2N where N is the time taken to receive
// the A response (the original total timeout is preserved if 2N would
// have exceeded it). However, since mDNSResponder caches values, a
// minimum value of 50ms for N is enforced in order to give some time
// for the receipt of a AAAA response.

I don't have a strong opinion about whether something like that should be in libc or use an external DNS caching daemon (mdnsd, lookupd, nscd, etc). I do recall infrequent but drastic problems with DNS caching (only under Solaris and not IRIX), and there seem to be plenty of people wanting to use NetBSD for an embedded platform where running an external DNS cache daemon isn't really what you want.

Regards,

--
-Chuck

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de