Discussion:
getaddrinfo(3) on numerical addresses
(too old to reply)
Edgar Fuß
2017-10-24 13:19:29 UTC
Permalink
I've noticed[*] that on NetBSD, getaddrinfo(3) does a resolver lookup even
if presented a numerical address. Is this on purpose? Would it have a
drawback if it would first try to inet_pton() the address?
On Linux, it seems to avoid the resolver lookup.

* The problem was net/nagios-plugins' check_ping to stall during a resolver
malfunction (despite the addresses given numerically). It turned out that
check_ping, in order to find out whether it needs to call a syntactically
different ping6 command, checks whether the -H argument is IPv6. It does
this eventually by calling getaddrinfo() with an AF_INET6 hint, resulting
in resolver lookups, which failed.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2017-10-24 13:43:21 UTC
Permalink
Post by Edgar Fuß
I've noticed[*] that on NetBSD, getaddrinfo(3) does a resolver lookup even
if presented a numerical address. Is this on purpose? Would it have a
drawback if it would first try to inet_pton() the address?
On Linux, it seems to avoid the resolver lookup.
That's what AI_NUMERICHOST is for? It's not clear whether it should or
should not parse the behavior, see comments in the code.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-24 15:17:19 UTC
Permalink
Post by Joerg Sonnenberger
That's what AI_NUMERICHOST is for?
No (or so I think).
I would like to save the resolver lookup if we know beforehand we don't need it.
Post by Joerg Sonnenberger
It's not clear whether it should or should not parse the behavior
I can't parse that sentence.

What's wrong with the following argument?
1a. domain name componentss consist of letters, digits and hyphens
1b. purely numeric components may be allowed, but are crazy
2. syntactically correct numerical IPv6 addresses contain at least one colon
3. syntactically correct numerical IPv4 addresses contain digits and dots
4. So a numeric IPv4 address can't be a a numeric IPv6 address, and vice
versa, and neither of them can be a FQDN.

First, try to inet_pton(AF_INET) (or equivalent) the argument. If that succeeds
-- with an AF_INET6 hint, fail
-- with an AF_INET hint or no hint, return the result
Next, try to inet_pton(AF_INET6) the argument. If that succeeds
-- with an AF_INET hint, fail
-- with an AF_INET6 hint or no hint, return the result
If both inet_pton()s fail, continue as normal.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2017-10-24 15:57:42 UTC
Permalink
Date: Tue, 24 Oct 2017 17:17:19 +0200
From: Edgar =?iso-8859-1?B?RnXf?= <***@math.uni-bonn.de>
Message-ID: <***@trav.math.uni-bonn.de>

| What's wrong with the following argument?
| 1a. domain name componentss consist of letters, digits and hyphens

That's not true, they can be almost anything (specifically including
dots and \0's) though many of the possibilities are not really practical.

| 1b. purely numeric components may be allowed, but are crazy

Have you never seen 1.2.3.4.in-addr.arpa ??

They certainly are allowed, and are not at all crazy. Not even in
other domains.

What's more, the resolver has no idea why a name is being looked up,
just that it is.

| 2. syntactically correct numerical IPv6 addresses contain at least one colon

That's correct. At least 2 really I think.

| 3. syntactically correct numerical IPv4 addresses contain digits and dots

Which is exactly what a domain name can contain. The only reason that
IP addresses (in textual format) are not currently valid domain names, is
that there are currently no top level domains with all numeric labels.
But who knows what ICANN will create next week?

| 4. So a numeric IPv4 address can't be a a numeric IPv6 address,

That's obvious, they're different lengths aside from anything else.

| and vice versa, and neither of them can be a FQDN.

v4 addresses exactly meet the criteria for a FQDN.

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2017-10-24 16:25:15 UTC
Permalink
Post by Robert Elz
| 3. syntactically correct numerical IPv4 addresses contain digits and dots
Which is exactly what a domain name can contain. The only reason that
IP addresses (in textual format) are not currently valid domain names, is
that there are currently no top level domains with all numeric labels.
But who knows what ICANN will create next week?
At least inet_pton parses a numerical IPv4 addresses without any dots.
Add the implicit search domain from /etc/resolv.conf and you can get
names for gai that contains purely digits and are still ambigious on
whether they should be resolved via DNS or not.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-24 17:38:40 UTC
Permalink
Post by Joerg Sonnenberger
At least inet_pton parses a numerical IPv4 addresses without any dots.
Despite the man page claiming so, it actually doesn't (on 6.1):

#include <stdio.h>
#include <arpa/inet.h>
#include <sys/socket.h>

int main(int argc, char *argv[]) {
char buf[16];
printf("%d\n", inet_pton(AF_INET, "1", buf));
return 0;
}

outputs "0" for me (which means failure).

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-24 18:41:05 UTC
Permalink
EF> domain name componentss consist of letters, digits and hyphens
KRE> That's not true, they can be almost anything (specifically including
KRE> dots and \0's) though many of the possibilities are not really practical.
Obviously, a component can't contain a dot and any argument to getaddrinfo()
can't contain a NULL.
Other than that, there indeed seems to be little consensus about what's allowed.
RFC 1035 says "start with a letter, end with a letter or digit, and have as
interior characters only letters, digits, and hyphen." which seems to be what
BIND enforces since 4.9.4 (unless you fiddle with check-names).

EF> purely numeric components may be allowed, but are crazy
KRE> Have you never seen 1.2.3.4.in-addr.arpa ??
OK, got me.

KRE> But who knows what ICANN will create next week?
I guess as soon as they invent numerical TLDs, it's time to start ignoring
them.


So back to my original question whether the observed behaviour was on purpose, the answer seems to be affirmative.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2017-10-25 08:10:35 UTC
Permalink
Date: Tue, 24 Oct 2017 20:41:05 +0200
From: Edgar =?iso-8859-1?B?RnXf?= <***@math.uni-bonn.de>
Message-ID: <***@trav.math.uni-bonn.de>

| Obviously, a component can't contain a dot

It isn't obvious at all, as it can.

The syntax to allow it in text representations isn't all that clear
(or even specified - and mostly nothing is implemented at all) but the
DNS itself is a 100% binary protocol (aside from the a==A nonsense) and
has no problem at all with any of the 256 values for each byte in any label.

| and any argument to getaddrinfo() can't contain a NULL.

not a '\0' itself in the textual form, but it could permit 'xxx\0xxx'
(literally that using shell quoting syntax, not C, in C it would be
more like "xxx\\0xxx") and convert the \ followed by 0 into a \0
byte (and similarly convert \. into an embedded '.').

That we (and just about everyone else) don't bother to do this is a
limitation of the implementation, not of the DNS.

| Other than that, there indeed seems to be little consensus about what's
| allowed.

No, there's actually very good consensus.

| RFC 1035 says "start with a letter, end with a letter or digit, and have as
| interior characters only letters, digits, and hyphen."

If you go back and read that carefully, you'll see that is a guideline for
names (and makes them compatible with what could go in HOSTS.TXT which was
important when 1035 was written, and also avoids problems when dealing
with applications originally written when HOSTS.TXT was the lookup method.)

See RFC2181 for a more detailed explanation of this issue (and others.)

| which seems to be what
| BIND enforces since 4.9.4 (unless you fiddle with check-names).

What bind enforces for the local zone - which is fine. Local admins
can subject their domain to whatever naming rules they like, in fact,
check names is really too inflexible, for example, an admin might want
to prohibit what (s)he regards as inappropriate names, like scheisse or
merde, and configure check-names to reject those as well (except the
implementation doesn't go that far). Their domain, their choice.

When first introduced, bind defaulted check-names on for all uses.
Vixie and I had "words" about that, and bind was changed to only
have it default on for primary zones.

| KRE> But who knows what ICANN will create next week?
| I guess as soon as they invent numerical TLDs, it's time to start ignoring
| them.

Unfortunately we don't get that choice. It isn't likely that it will
ever happen, but we do not write code to deal with "I hope it continues
like ..." - we assume the worst, and perhaps optimise for the more likely
cases.

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2017-10-24 17:10:22 UTC
Permalink
Post by Edgar Fuß
Post by Joerg Sonnenberger
That's what AI_NUMERICHOST is for?
No (or so I think).
I would like to save the resolver lookup if we know beforehand we don't need it.
But that's the point of AI_NUMERICHOST. You don't know in advance if it
is necessary or not.
Post by Edgar Fuß
Post by Joerg Sonnenberger
It's not clear whether it should or should not parse the behavior
I can't parse that sentence.
The gai implementation talks about the DNS fallback a bit and why it may
or may not be a bug.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
神明達哉
2017-10-24 17:00:44 UTC
Permalink
At Tue, 24 Oct 2017 15:19:29 +0200,
Post by Edgar Fuß
I've noticed[*] that on NetBSD, getaddrinfo(3) does a resolver lookup even
if presented a numerical address. Is this on purpose? Would it have a
drawback if it would first try to inet_pton() the address?
On Linux, it seems to avoid the resolver lookup.
* The problem was net/nagios-plugins' check_ping to stall during a resolver
malfunction (despite the addresses given numerically). It turned out that
check_ping, in order to find out whether it needs to call a syntactically
different ping6 command, checks whether the -H argument is IPv6. It does
this eventually by calling getaddrinfo() with an AF_INET6 hint, resulting
in resolver lookups, which failed.
I suspect there's some misunderstanding. To be sure that we talk
about the same thing, please let me ask you to have some experiment:

- compile the pasted code below and name the executable, say, 'gai'
- run the program as follows:
% RES_OPTIONS=debug 2001:db8::1234
- also try this:
% RES_OPTIONS=debug www.netbsd.org

From the above description, it seems to me that you're saying in both
cases you'll see verbose output. But, from my code inspection, and in
my understanding of it, and with my local experiment with some BSD
variant implementations including a quite old version (6.1) of NetBSD,
the first case should exit very quietly since it doesn't invoke name
resolution using DNS (or for that matter any other name resolution
protocol).

--
JINMEI, Tatuya

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <string.h>
#include <stdio.h>

int
main(int argc, char **argv) {
struct addrinfo hints, *res;
int error;

if (argc < 2)
return 1;

memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_INET6;

if (getaddrinfo(argv[1], NULL, &hints, &res) == 0)
printf("getaddrinfo succeeded\n");
}

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Taylor R Campbell
2017-10-25 01:57:31 UTC
Permalink
Date: Tue, 24 Oct 2017 10:00:44 -0700
I suspect there's some misunderstanding. To be sure that we talk
- compile the pasted code below and name the executable, say, 'gai'
% RES_OPTIONS=debug 2001:db8::1234
% RES_OPTIONS=debug www.netbsd.org
FYI, we have a program in base that will do this now, called
getaddrinfo(1). The following should do the same thing:

getaddrinfo -f inet6 2001:db8::1234

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-24 17:32:07 UTC
Permalink
Post by 神明達哉
I suspect there's some misunderstanding.
Yes. It's about numerical IPv4 addresses being looked up with an AF_INET6 hint.

check_ping tries to figure out whether it needs to call ping6 (suppose
wer'e on a system where ping6 is different from ping).
Suppose that check_ping is invoked without -4/-6 and with -H 1.2.3.4.
Now check_ping tries to figure out whether the -H argument is a/resolves to
a IPv6 address. To do so, it checks whether getaddrinfo() on the -H argument
(which is 1.2.3.4) with an AF_INET6 hint succeeds. But this triggers a DNS
lookup.
Given this is a monitoring system, who's job it is to detect server failures,
marking random servers/switches as dead while the resolver is going mad and
so check_ping on their numerical IPv4 times out is not particularily useful.

I guess the point is what you expect getaddrinfo on 1.2.3.4 with a AF_INET6
hint to do. Well, you could have a search domain of numerical.org and
1.2.3.4.numerical.org have an AAAA record. How likely is that?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
神明達哉
2017-10-24 17:51:48 UTC
Permalink
At Tue, 24 Oct 2017 19:32:07 +0200,
Post by Edgar Fuß
Post by 神明達哉
I suspect there's some misunderstanding.
Yes. It's about numerical IPv4 addresses being looked up with an AF_INET6 hint.
check_ping tries to figure out whether it needs to call ping6 (suppose
wer'e on a system where ping6 is different from ping).
Suppose that check_ping is invoked without -4/-6 and with -H 1.2.3.4.
Now check_ping tries to figure out whether the -H argument is a/resolves to
a IPv6 address. To do so, it checks whether getaddrinfo() on the -H argument
(which is 1.2.3.4) with an AF_INET6 hint succeeds. But this triggers a DNS
lookup.
Ah, okay.
Post by Edgar Fuß
Given this is a monitoring system, who's job it is to detect server failures,
marking random servers/switches as dead while the resolver is going mad and
so check_ping on their numerical IPv4 times out is not particularily useful.
I guess the point is what you expect getaddrinfo on 1.2.3.4 with a AF_INET6
hint to do. Well, you could have a search domain of numerical.org and
1.2.3.4.numerical.org have an AAAA record. How likely is that?
I suspect different people could have different opinions on this
point. Some people might say 1.2.3.4.numerical.org is unlikely enough
and can be ignored. Some people might just disagree. Some other
might even think it's not about likeliness but about flexibility
(i.e., unless that's an invalid form the library should be able to
work for it).

So, especially for a portable application, I would say it should be
handled at the application level: as already pointed out, setting
AI_NUMERICHOST should prevent the unintended name resolution in the
above example (I confirmed it with my test code). I believe that will
work for almost all variants of getaddrinfo() implementations that
way, whatever policy its developer has on what should happen for
1.2.3.4 with AF_INET6 but not AI_NUMERICHOST.

--
JINMEI, Tatuya

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-24 18:56:49 UTC
Permalink
Post by 神明達哉
So, especially for a portable application, I would say it should be
handled at the application level: as already pointed out, setting
AI_NUMERICHOST should prevent the unintended name resolution in the
above example (I confirmed it with my test code). I believe that will
work for almost all variants of getaddrinfo() implementations that
way, whatever policy its developer has on what should happen for
1.2.3.4 with AF_INET6 but not AI_NUMERICHOST.
I guess that setting AI_NUMERICHOST will make getaddrinfo() fail if the argument is non-numeric.
So if the desired application behaviour (i.e. what I consider useful in the case of check_ping) is:
1. If it's parsable as a numeric IPv4 address, treat it as v4, no DNS lookup
2. If it's parsable as a numeric IPv6 address, treat it as v6, no DNS lookup
3. Else, do a DNS lookup (where one can argue what to do if the name has both
A and AAAA records)
then no single invocation of getaddrinfo() will do, right?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-25 10:54:25 UTC
Permalink
Post by 神明達哉
If you specify AF_UNSPEC for hints.ai_family and not AI_NUMERICHOST
for ai_flags, it should work that way for BSD-variants' implementation
of getaddrinfo().
Ah yes, of course. Stupid me. Thanks.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-25 12:07:03 UTC
Permalink
EF> Ah yes, of course. Stupid me. Thanks.
So what I've learned (thanks!) from this discussion:

Calling getaddrinfo() with a hint of AF_INET/AF_INET6 means "if you try hard,
can you make this a IPv4/IPv6 address" (e.g., look up 1.2.3.4.numerical.org
or ::1.i-like-colons.org)
If you want "does this look like a IPv4/IPv6 address" instead, call
getaddrinfo() without a hint and examine res->ai_family.

Which makes sense.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
神明達哉
2017-10-24 20:07:51 UTC
Permalink
At Tue, 24 Oct 2017 20:56:49 +0200,
Post by Edgar Fuß
Post by 神明達哉
So, especially for a portable application, I would say it should be
handled at the application level: as already pointed out, setting
AI_NUMERICHOST should prevent the unintended name resolution in the
above example (I confirmed it with my test code). I believe that will
work for almost all variants of getaddrinfo() implementations that
way, whatever policy its developer has on what should happen for
1.2.3.4 with AF_INET6 but not AI_NUMERICHOST.
I guess that setting AI_NUMERICHOST will make getaddrinfo() fail if the argument is non-numeric.
1. If it's parsable as a numeric IPv4 address, treat it as v4, no DNS lookup
2. If it's parsable as a numeric IPv6 address, treat it as v6, no DNS lookup
3. Else, do a DNS lookup (where one can argue what to do if the name has both
A and AAAA records)
then no single invocation of getaddrinfo() will do, right?
If you specify AF_UNSPEC for hints.ai_family and not AI_NUMERICHOST
for ai_flags, it should work that way for BSD-variants' implementation
of getaddrinfo(). Try it with my test code just by commenting out
this line:
hints.ai_family = AF_INET6;

--
JINMEI, Tatuya

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
John Nemeth
2017-10-24 18:27:02 UTC
Permalink
On Oct 24, 7:32pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
}
} > I suspect there's some misunderstanding.
} Yes. It's about numerical IPv4 addresses being looked up with an AF_INET6 hint.
}
} check_ping tries to figure out whether it needs to call ping6 (suppose
} wer'e on a system where ping6 is different from ping).
} Suppose that check_ping is invoked without -4/-6 and with -H 1.2.3.4.
} Now check_ping tries to figure out whether the -H argument is a/resolves to
} a IPv6 address. To do so, it checks whether getaddrinfo() on the -H argument
} (which is 1.2.3.4) with an AF_INET6 hint succeeds. But this triggers a DNS
} lookup.
} Given this is a monitoring system, who's job it is to detect server failures,
} marking random servers/switches as dead while the resolver is going mad and
} so check_ping on their numerical IPv4 times out is not particularily useful.

This totally sounds like a bug in the monitoring system, not
a libc bug. If it is depending on esoteric implementation defined
behaviour then it is buggy and needs to find another way to do
what it wants.

} I guess the point is what you expect getaddrinfo on 1.2.3.4 with a AF_INET6
} hint to do. Well, you could have a search domain of numerical.org and
} 1.2.3.4.numerical.org have an AAAA record. How likely is that?

Irrelevant how likely it is. If it is legal then getaddrinfo()
needs to handle it.

}-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Valery Ushakov
2017-10-24 20:58:37 UTC
Permalink
Post by Edgar Fuß
Post by 神明達哉
I suspect there's some misunderstanding.
Yes. It's about numerical IPv4 addresses being looked up with an AF_INET6 hint.
check_ping tries to figure out whether it needs to call ping6 (suppose
wer'e on a system where ping6 is different from ping).
Suppose that check_ping is invoked without -4/-6 and with -H 1.2.3.4.
Now check_ping tries to figure out whether the -H argument is a/resolves to
a IPv6 address. To do so, it checks whether getaddrinfo() on the -H argument
(which is 1.2.3.4) with an AF_INET6 hint succeeds. But this triggers a DNS
lookup.
Given this is a monitoring system, who's job it is to detect server failures,
marking random servers/switches as dead while the resolver is going mad and
so check_ping on their numerical IPv4 times out is not particularily useful.
BTW, isn't that exactly the argument that this program must use
AI_NUMERICHOST which guarantees that no name resolution will be
attempted?

-uwe


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2017-10-25 12:00:19 UTC
Permalink
EF> Given this is a monitoring system, who's job it is to detect server failures,
EF> marking random servers/switches as dead while the resolver is going mad and
EF> so check_ping on their numerical IPv4 times out is not particularily useful.
VU> BTW, isn't that exactly the argument that this program must use
VU> AI_NUMERICHOST which guarantees that no name resolution will be
VU> attempted?
While it's debatable how much sense it makes to use check_ping with a
non-numerical address, as-is, it does accept them. I can't upstream a patch
breaking this.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...