o cupertino, where art thou?

rotten appleSure, the Mac is exciting. I have nightmares about living in a world where laptops have no backlit keyboards and large shiny screens, sleek user interfaces and some other nice details. I must admit these guys do things the right way. But then…

For those with ADD: Why, oh why does leopard insist on storing entries on its DNS cache with TTL=3600 when the DNS proxy is sending them with TTL=0?

For everyone else: I’m testing DSL routers. One of the “nice” features to have in such a device is something that stands between the DNS clients (computers connected to the LAN ports) and the DNS server (usually provided by the network). This is usually called a DNS proxy or DNS relay and will (hopefully, in decent implementations), if the DSL connection is down, translate every single request to an internal/private/reserved IP address. If the user is using a browser, the www.whatever.com DNS query returns the mentioned IP address instead of the real one and the HTTP request will land on the router itself, which will politely say “your connection is down, go get a towel or something”. Aditionally, the router will send the DNS answer time-to-live to something real low (zero, preferably, so the “fake” DNS entry doesn’t get stored in local DNS caches).

So far so good.

Eventually, DSL connections go up. Sometimes they don’t (but that’s not my problem).

Should the DSL connection go up again, the router will probably fetch a working DNS server from the network and start resolving DNS correctly. End of story.

***Except*** there seems to be something essentially wrong with Leopard (I’m hearing people grinning and whispering “eh, don’t get me started…”).

A quick search on Google returns a few hundred hits on people whining about Leopard’s DNS implementation and the fact that it’ll take a few dscacheutil -flushcache iterations a day to clean the cache from DNS lookup failures (due to unresponsive DNS servers, for instance). Those error entries get marked with a big YES on dscacheutil -cachedump -entries “Neg” column. And they’ll probably have some sort of dealing up with that clutter. Not the issue here.

There’s another issue I won’t bother to go into right now, but is slightly related. Most entries on the cache show up has having TTL=3600 when the DNS server is sending values far above that (12 or 24 hours, just to name a few – I’ve got the packet captures to show it). The sentence “Why would someone want to store a DNS cache entry for so many time” makes some sense to me, but it would be nice if the operating system just did as it’s being told.

Which brings “the” problem here (my problem, at least – but JP mentioned a similar behavior around the time I first noticed the issue on my equipment): Leopard is storing entries with a TTL of 3600 (seconds) for entries that were sent with TTL=0. The DNS relay is saying “don’t cache this entry” but Leopard insists on doing so – and for an hour or something. dscacheutil -flushcache will fix it, but this is plainly annoying.

Here’s the DNS response from the DSL modem on Wireshark:
Wireshark capture

Windows’ ipconfig /displaydns listing (showing no trace of the google.com entry):
windows doing something right (for a change)

Last but not least, the Leopard dscache -cachedump -entries listing with a glorious TTL=3600:
it just works (not)

No sight of similar problems on the Ubuntu installation I tried. Pedro confirmed me minutes ago that Tiger behaves slightly better than Leopard and honors the TTL=0, keeping these records away from the cache (as it should).

Now could someone do something about this? I just did. Filed bug #5711166 and became a Mac Geek, according to Rui. Why do I have a feeling I’ll regret this? ;)

(Update: this issue has been fixed on the Mac OS X 10.5.3 Server update. Not bad.)

4 Responses to “o cupertino, where art thou?” »»

  1. Comment by JP Antunes | 01/29/08 at 10:48 am

    Great work Bruno.

    But still, I think there’s something more to it. I’ve made an experience where I spent a whole day using a linux laptop in the same networking environment where I experience (several) disconnections with Leopard and I didn’t had a single problem.

    Could you give me some pointers as to how I should go about testing/proving my theory?

  2. Comment by bruno | 01/30/08 at 11:48 pm

    I’m not sure about what you want to test on your scenario. Drop me an email (it’s on the “about the author” page) and we’ll discuss it and (if possible) summarize here any possible conclusions.

  3. Comment by Finlay | 02/17/08 at 6:32 pm

    If you’re really bothered you could even read the source: http://www.opensource.apple.com/darwinsource/10.5.2/DirectoryService-514.4

  4. Comment by Pedro Melo | 02/17/08 at 6:33 pm

    Yes, leopard is retarded. It should respect TTL=0.

    But DNS proxies who reply with their own address when they cannot reach the server, or when the DSL line is down, that’s just evil!

    And with a mac, not needed. The message that Safari displays (and the nice button, Network Troubleshooting) is much better than that stupid page the ADSL router puts up.

    I wish we could disable that behavior. I haven’t found the proper switch yet, but I haven’t lost hope.

    Best regards,

Leave a Reply »»