Best Practice: Reverse DNS Zones
The general rule with Xsan, since version 1.1 or so, is this: DNS isn't necessary, but if you have some you had better make sure it is perfect. A few of us are beginning to suspect that DNS is, in fact, required, although in a very obscure way. It will take me several paragraphs to explain why, but let me get to the bottom line first:
Your DNS servers should include a zone for reverse lookups of your metadata (private) IP range. Ask your DNS administrator to create a reverse zone for this range, with SOA and NS records. PTR records aren't needed.
Read more to find out why.
I'm going to start with an overview of DNS. Skip down for the better stuff.
- A review of DNS
- Reverse DNS?
- Reverse lookups on private IP ranges
- What's this all to do with Xsan?
- The bottom line, once again
- How to add private records to your DNS server
- And what's this got to do with iTunes?
The Domain Name Service is a system that translates names into IP numbers. Most people (a few of you excepted) find it easier to remember names (like "www.adic.com") than numbers (like "22.214.171.124"). An essential part of every network, a DNS server is expected to quickly and fully resolve names given to it, or to at least to quickly return a negative response. This is accomplished via a tremendous distributed database that spreads all over the Internet, mostly hierarchically, from the 13 root name servers, through the servers for each top-level domain, down to individual servers for each organization or smaller unit.
A query to your DNS server goes like this (type "dig +trace www.adic.com" to follow along):
First, the DNS servers need to know the addresses of the root DNS servers. These are stored in a local file on the server.
. 399215 IN NS G.ROOT-SERVERS.NET. . 399215 IN NS H.ROOT-SERVERS.NET. . 399215 IN NS I.ROOT-SERVERS.NET. . 399215 IN NS J.ROOT-SERVERS.NET. . 399215 IN NS K.ROOT-SERVERS.NET. . 399215 IN NS L.ROOT-SERVERS.NET. . 399215 IN NS M.ROOT-SERVERS.NET. . 399215 IN NS A.ROOT-SERVERS.NET. . 399215 IN NS B.ROOT-SERVERS.NET. . 399215 IN NS C.ROOT-SERVERS.NET. . 399215 IN NS D.ROOT-SERVERS.NET. . 399215 IN NS E.ROOT-SERVERS.NET. . 399215 IN NS F.ROOT-SERVERS.NET. ;; Received 388 bytes from 192.168.1.2#53(192.168.1.2) in 15 ms
The first dot (".") represents the DNS root. "NS" means the following name is the authoritative name server responsible for that (root) domain.
Next, the address www.adic.com is broken apart in reverse order. The DNS server looks to see who is responsible for names ending in ".com", by asking a random server from the above list:
com. 172800 IN NS L.GTLD-SERVERS.NET. com. 172800 IN NS M.GTLD-SERVERS.NET. com. 172800 IN NS A.GTLD-SERVERS.NET. com. 172800 IN NS B.GTLD-SERVERS.NET. com. 172800 IN NS C.GTLD-SERVERS.NET. com. 172800 IN NS D.GTLD-SERVERS.NET. com. 172800 IN NS E.GTLD-SERVERS.NET. com. 172800 IN NS F.GTLD-SERVERS.NET. com. 172800 IN NS G.GTLD-SERVERS.NET. com. 172800 IN NS H.GTLD-SERVERS.NET. com. 172800 IN NS I.GTLD-SERVERS.NET. com. 172800 IN NS J.GTLD-SERVERS.NET. com. 172800 IN NS K.GTLD-SERVERS.NET. ;; Received 502 bytes from 126.96.36.199#53(G.ROOT-SERVERS.NET) in 81 ms
Next, we ask to find out the servers responsible for "adic.com":
adic.com. 172800 IN NS auth10.ns.wcom.com. adic.com. 172800 IN NS auth20.ns.wcom.com. adic.com. 172800 IN NS ns01hq.adic.com. ;; Received 149 bytes from 188.8.131.52#53(L.GTLD-SERVERS.NET) in 4312 ms
Finally, we ask one of these for the address ("A") record for "www.adic.com":
www.adic.com. 21600 IN A 184.108.40.206 ;; Received 204 bytes from 220.127.116.11#53(auth10.ns.wcom.com) in 204 ms
All this needs to happen with every new name that your Mac encounters. Intelligently, both DNS servers and hosts cache the results, so repeated queries don't go through so much trouble.
DNS also handles queries in reverse, from an address to a name. The method for this is either brilliant or an ugly hack, depending on your state of mind.
So say you want to reverse lookup 18.104.22.168. The address is inverted as 22.214.171.124, then suffixed with the reverse domain, in-addr.arpa. (.arpa is a special top-level domain.) So the lookup is done on the unwieldy address 126.96.36.199.in-addr.arpa. (use "dig +trace 188.8.131.52.in-addr.arpa." or, more simply, "dig +trace -x 184.108.40.206".)
Again, the query begins with root servers:
. 397910 IN NS A.ROOT-SERVERS.NET. . 397910 IN NS B.ROOT-SERVERS.NET. . 397910 IN NS C.ROOT-SERVERS.NET. . 397910 IN NS D.ROOT-SERVERS.NET. . 397910 IN NS E.ROOT-SERVERS.NET. . 397910 IN NS F.ROOT-SERVERS.NET. . 397910 IN NS G.ROOT-SERVERS.NET. . 397910 IN NS H.ROOT-SERVERS.NET. . 397910 IN NS I.ROOT-SERVERS.NET. . 397910 IN NS J.ROOT-SERVERS.NET. . 397910 IN NS K.ROOT-SERVERS.NET. . 397910 IN NS L.ROOT-SERVERS.NET. . 397910 IN NS M.ROOT-SERVERS.NET. ;; Received 420 bytes from 192.168.1.2#53(192.168.1.2) in 2 ms
It continues with the herby servers responsible for reverse lookups on 63.x.x.x:
63.in-addr.arpa. 86400 IN NS chia.ARIN.NET. 63.in-addr.arpa. 86400 IN NS dill.ARIN.NET. 63.in-addr.arpa. 86400 IN NS BASIL.ARIN.NET. 63.in-addr.arpa. 86400 IN NS henna.ARIN.NET. 63.in-addr.arpa. 86400 IN NS indigo.ARIN.NET. 63.in-addr.arpa. 86400 IN NS epazote.ARIN.NET. 63.in-addr.arpa. 86400 IN NS figwort.ARIN.NET. ;; Received 195 bytes from 220.127.116.11#53(A.ROOT-SERVERS.NET) in 76 ms
Then with the second octet 63.81.x.x:
81.63.in-addr.arpa. 86400 IN NS AUTH03.NS.UU.NET. 81.63.in-addr.arpa. 86400 IN NS AUTH00.NS.UU.NET. ;; Received 95 bytes from 18.104.22.168#53(chia.ARIN.NET) in 4206 ms
117.81.63.in-addr.arpa. 21600 IN NS ns01hq.adic.com. 117.81.63.in-addr.arpa. 21600 IN NS auth02.ns.uu.net. 117.81.63.in-addr.arpa. 21600 IN NS auth60.ns.uu.net. ;; Received 124 bytes from 22.214.171.124#53(AUTH03.NS.UU.NET) in 74 ms
And finally with the record we want:
126.96.36.199.in-addr.arpa. 21600 IN PTR www.adic.com. ;; Received 558 bytes from 188.8.131.52#53(ns01hq.adic.com) in 146 ms
So the servers ns01hq.adic.com, auth02.ns.uu.net, and auth60.ns.uu.net handle reverse lookups for ADIC. Not surprisingly, these are the same servers that handle the forward lookups.
Three IP ranges are reserved for non-routable intranets, and are therefore commonly used for the private metadata network on Xsans:
- 10.0.0.0 - 10.255.255.255
- 172.16.0.0 - 172.31.255.255
- 192.168.0.0 - 192.168.255.255
So who is responsible for reverse lookups on these ranges?
baa:~ aaron$ dig +trace -x 10.1.1.1 ; > DiG 9.2.2 > +trace -x 10.1.1.1 10.in-addr.arpa. 86400 IN NS BLACKHOLE-1.IANA.ORG. 10.in-addr.arpa. 86400 IN NS BLACKHOLE-2.IANA.ORG. ;; Received 99 bytes from 184.108.40.206#53(G.ROOT-SERVERS.NET) in 63 ms 10.in-addr.arpa. 604800 IN SOA prisoner.iana.org. hostmaster.root-servers.org. 2002040800 1800 900 604800 604800 ;; Received 116 bytes from 220.127.116.11#53(BLACKHOLE-1.IANA.ORG) in 103 ms
What about for all 10.x.x.x addresses?
baa:~ aaron$ dig +short ns -x 10 blackhole-2.iana.org. blackhole-1.iana.org.
And the 192.168.x.x range?
baa:~ aaron$ dig +short ns -x 192.168 blackhole-1.iana.org. blackhole-2.iana.org.
And even 172.16.x.x?
baa:~ aaron$ dig +short ns -x 172.16 blackhole-2.iana.org. blackhole-1.iana.org.
What's that? Two servers handle the reverse DNS lookups for all possible private network ranges? $30 home routers are installed in just about every home in the U.S., at least, and almost all of them use an address in the 192.168.x.x range. And every time one of them decides to ask, "what's the name of my peer that just sent that request," the answer is routed to one of those two servers.
I found an FAQ on the blackhole servers, and this interesting tidbit:
Q5: How busy are the blackhole servers?
A5: While rates vary, the blackhole servers generally answer thousands of queries per second. In the past couple of years the number of queries to the blackhole servers has increased dramatically. It is believed that the large majority of those queries occur because of "leakage" from intranets that are using the RFC 1918 private addresses. This can happen if the private intranet is internally using services that automatically do reverse queries, and the local DNS resolver needs to go outside the intranet to resolve these names. For well-configured intranets, this shouldn't happen. Users of private address space should have their local DNS configured to provide responses to inverse lookups in the private address space.
I added the emphasis at the end. Sure enough, trying out some queries yesterday, I got this response (slightly truncated):
aaron-g5:~ aaron$ dig +trace -x 192.168.1.1 ; > DiG 9.2.2 > +trace -x 192.168.1.1 168.192.in-addr.arpa. 86400 IN NS blackhole-1.iana.org. 168.192.in-addr.arpa. 86400 IN NS blackhole-2.iana.org. ;; Received 102 bytes from 18.104.22.168#53(indigo.ARIN.NET) in 26 ms ;; connection timed out; no servers could be reached
Maybe the servers were too busy to handle my request.
I'd say close to 100% of the Xsans that use private metadata networks use one of the three ranges listed above. And in many, probably close to most, of these SANs never bothered with DNS on the private network. I don't mean you need DNS servers on your private network; these probably wouldn't be used anyway. I mean adding an appropriate reverse zone (ending with ".in-addr.arpa") on your public network's existing DNS servers.
Now I can tell you for sure that the Xsan client and the MDCs are going to attempt reverse DNS lookups on the private network IPs. I don't know why -- maybe for logging, maybe for security, or maybe it is a bug. If the Xsan client gets a valid PTR response, great! If it gets a negative response, great! But if it gets no response, if there is a timeout, or if the PTR is incorrect, your SAN won't start.
Put these two facts together, and you come to the uncomfortable but logical conclusion that nearly every Xsan in the world relies on the responsiveness of two obscure servers on the Internet, blackhole-1.iana.org and blackhole-2.iana.org.
Get your SAN out of the blackhole! As the Blackhole FAQ states:
Users of private address space should have their local DNS configured to provide responses to inverse lookups in the private address space.
The person who set up your DNS should be able to do this with no trouble. You won't need actual records for the hosts on the private range, just a NS (nameserver) and SOA (Start of Authority) record. The DNS server will then send quick negative responses to any queries, without forwarding requests to the Blackhole IANA servers.
By no means should you set up a new DNS server in an existing network environment without a lot of careful planning. There are standard options used in corporate environments that are not available in OS X Server Admin's GUI. Leave off those options, and you can easily screw the people you are trying to help.
If you are already using Mac OS X Server as a DNS server, then you are the one who needs to add the zone. There's no way in Server Admin to add an empty zone, but if you add forward records for a host or two on your private LAN, Server Admin will create the reverse zone.
I'd recommend adding DNS records for MDCs. You probably already have records that point to the public IP addresses of your MDCs. When adding records for the private IP addresses, make sure the names you use are different than the names that resolve to the public addresses. I recommend something like "mdc1-private.company.com".
Two common client problems: All your SAN MDCs and clients will need the IP address of this server in their Network preference pane, under the primary (public) interface. And in the Network preference pane, never mix DNS servers you control with those you do not control. Every server listed there must return identical information for each query, or else you'll get intermittent incorrect responses.
Well, nothing. But it's beginning to be clear that the iTunes "issue" isn't so clear. It may have nothing to do with iTunes, too.
So what's hapenning? Perhaps the blackhole servers were experiencing problems last week. I looked for two days but found no reports like this. Or maybe something in the Mac OS changed to do reverse lookups more frequently.
My personal suspicion is that this issue has been responsible for many of the reports we've heard of in the last couple of weeks. The symptoms certainly sound indicative: unplug the public network (or remove DNS) and the SAN starts. The DNS fix was just the trick for me on a Labor Day 11pm "SAN Down" call. The corporate Internet was down, so the Blackhole servers weren't accessible.
I look forward to the comments, especially the "I tried everything you said but still have the same problem" ones. Best of luck!