Should DNS A timeout to result in permanent email failure?

George Bailey asked:

Occasionally an email does not get through when sent to a recipient who uses the outlook.com system for their email server. Their MX record resolves fine, but their A record (something like example-com.mail.protection.outlook.com) times out.

Using sendmail here, and I’m not expert. I inherited the configuration, and don’t know much about the settings. One thing I do know is that it hasn’t been edited for years, and there’s been no indication of a problem.

I don’t know if it is intentional, but the response from dig example-com.mail.protection.outlook.com times out after 15 seconds, and then later digs are successful.

We are using our own BIND DNS server for caching which also hasn’t been reconfigured for at least that long.

It appears that our sendmail system gives up after getting host not found for example-com.mail.protection.outlook.com. Is it appropriate for this permanent failure to occur? Should it instead be changed to temporary failure? What is the standard? Is outlook.com wrong or our sendmail?

Edit

For your reference, here are the relevant log entries from maillog, with sensitive info masked example.com represents recipient server, and example.net represents our own domain.

Jun 16 09:28:28 myhostname sendmail[8613]: [ID 801593 mail.info] s5GDSOZ4008613: from=websusr, size=16975, class=0, nrcpts=2, msgid=<201406161328.s5GDSOZ4008613@myhostname.example.net>, relay=websusr@localhost
Jun 16 09:28:28 myhostname sendmail[8617]: [ID 801593 mail.info] s5GDSSIP008617: from=<websusr@myhostname.example.net>, size=17222, class=0, nrcpts=2, msgid=<201406161328.s5GDSOZ4008613@myhostname.example.net>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1]
Jun 16 09:28:28 myhostname sendmail[8613]: [ID 801593 mail.info] s5GDSOZ4008613: to="John Doe" <john@example.com>, ctladdr=websusr (60001/60001), delay=00:00:04, xdelay=00:00:00, mailer=relay, pri=76975, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s5GDSSIP008617 Message accepted for delivery)
Jun 16 09:32:09 myhostname sendmail[8618]: [ID 801593 mail.info] s5GDSSIP008617: to=<john@example.com>, ctladdr=<websusr@myhostname.example.net> (60001/60001), delay=00:03:41, xdelay=00:03:40, mailer=esmtp, pri=77440, relay=example-com.mail.eo.outlook.com., dsn=5.1.2, stat=Host unknown (Name server: example-com.mail.eo.outlook.com.: host not found)
Jun 16 09:32:09 myhostname sendmail[8618]: [ID 801593 mail.info] s5GDSSIP008617: s5GDW9IP008618: DSN: Host unknown (Name server: example-com.mail.eo.outlook.com.: host not found)

Also outputs of dig as of just now, though at the moment the problem is not occurring, it allows you to see the mx record.

>dig example.com mx

; <<>> DiG 9.3.2 <<>> example.com mx
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1448
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 0

;; QUESTION SECTION:
;example.com.                 IN      MX

;; ANSWER SECTION:
example.com.          3461    IN      MX      0 example-com.mail.protection.outlook.com.
example.com.          3461    IN      MX      10 example-com.mail.eo.outlook.com.

;; AUTHORITY SECTION:
example.com.          86261   IN      NS      ns1.example.org.
example.com.          86261   IN      NS      ns2.example.org.

;; Query time: 0 msec
;; SERVER: 10.0.0.109#53(10.0.0.109)
;; WHEN: Thu Jul  3 09:32:08 2014
;; MSG SIZE  rcvd: 215

>dig example-com.mail.protection.outlook.com

; <<>> DiG 9.3.2 <<>> example-com.mail.protection.outlook.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1734
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 2, ADDITIONAL: 6

;; QUESTION SECTION:
;example-com.mail.protection.outlook.com. IN A

;; ANSWER SECTION:
example-com.mail.protection.outlook.com. 10 IN A 207.46.163.170
example-com.mail.protection.outlook.com. 10 IN A 207.46.163.215
example-com.mail.protection.outlook.com. 10 IN A 207.46.163.138

;; AUTHORITY SECTION:
mail.protection.outlook.com. 1800 IN    NS      ns1-proddns.glbdns.o365filtering.com.
mail.protection.outlook.com. 1800 IN    NS      ns2-proddns.glbdns.o365filtering.com.

;; ADDITIONAL SECTION:
ns1-proddns.glbdns.o365filtering.com. 30 IN A   207.46.100.42
ns1-proddns.glbdns.o365filtering.com. 30 IN A   207.46.163.143
ns1-proddns.glbdns.o365filtering.com. 30 IN A   207.46.163.176
ns2-proddns.glbdns.o365filtering.com. 30 IN A   207.46.163.176
ns2-proddns.glbdns.o365filtering.com. 30 IN A   207.46.100.42
ns2-proddns.glbdns.o365filtering.com. 30 IN A   207.46.163.143

;; Query time: 464 msec
;; SERVER: 10.0.0.109#53(10.0.0.109)
;; WHEN: Thu Jul  3 09:33:30 2014
;; MSG SIZE  rcvd: 276

>dig example-com.mail.eo.outlook.com

; <<>> DiG 9.3.2 <<>> example-com.mail.eo.outlook.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 940
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 2, ADDITIONAL: 12

;; QUESTION SECTION:
;example-com.mail.eo.outlook.com. IN  A

;; ANSWER SECTION:
example-com.mail.eo.outlook.com. 10 IN A      207.46.163.138
example-com.mail.eo.outlook.com. 10 IN A      207.46.163.170
example-com.mail.eo.outlook.com. 10 IN A      207.46.163.247

;; AUTHORITY SECTION:
mail.eo.outlook.com.    5450    IN      NS      ns1-prodeodns.glbdns.o365filtering.com.
mail.eo.outlook.com.    5450    IN      NS      ns2-prodeodns.glbdns.o365filtering.com.

;; ADDITIONAL SECTION:
ns1-prodeodns.glbdns.o365filtering.com. 30 IN A 157.55.234.42
ns1-prodeodns.glbdns.o365filtering.com. 30 IN A 157.56.112.42
ns1-prodeodns.glbdns.o365filtering.com. 30 IN A 207.46.100.42
ns1-prodeodns.glbdns.o365filtering.com. 30 IN A 207.46.163.143
ns1-prodeodns.glbdns.o365filtering.com. 30 IN A 207.46.163.176
ns1-prodeodns.glbdns.o365filtering.com. 30 IN A 65.55.169.42
ns2-prodeodns.glbdns.o365filtering.com. 30 IN A 65.55.169.42
ns2-prodeodns.glbdns.o365filtering.com. 30 IN A 157.55.234.42
ns2-prodeodns.glbdns.o365filtering.com. 30 IN A 157.56.112.42
ns2-prodeodns.glbdns.o365filtering.com. 30 IN A 207.46.100.42
ns2-prodeodns.glbdns.o365filtering.com. 30 IN A 207.46.163.143
ns2-prodeodns.glbdns.o365filtering.com. 30 IN A 207.46.163.176

;; Query time: 248 msec
;; SERVER: 10.0.0.109#53(10.0.0.109)
;; WHEN: Thu Jul  3 09:33:45 2014
;; MSG SIZE  rcvd: 368

>

My answer:


If DNS resolution simply times out and a response never comes back from the DNS server at all, or the return is SERVFAIL, then the message is supposed to be queued and tried again later.

If DNS resolution returns NXDOMAIN (the name doesn’t exist) then the message is supposed to be returned immediately.

See RFC 5321, section 5.1:

The lookup first attempts to locate an MX record associated with the name. If a CNAME record is found, the resulting name is processed as if it were the initial name. If a non-existent domain error is returned, this situation MUST be reported as an error. If a temporary error is returned, the message MUST be queued and retried later (see Section 4.5.4.1).

In your case you should be looking at why your DNS server appears to be failing intermittently. You should also check sendmail’s logs to find out exactly what it saw when it tried to do the DNS resolution.


View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.