Trouble Shooting Wiki

Troubleshooting Postfix

From TroubleshootingWiki

Jump to: navigation, search
Postfix
Official Page
Project Documentation
Download
Source Book
Linux Email: Set up and Run a Small Office Email Server
Linux Email: Set up and Run a Small Office Email Server
ISBN 978-1-904811-37-4
Publisher Packt Publishing
Author(s) Patrick Ben Koetter, Carl Taylor, Magnus Back, Ralf Hildebrandt, David Rusenko, Alistair McDonald

I realize having a whole tutorial devoted to troubleshooting Postfix might look bad. Is Postfix really so hard to get right? No, it is not; quite the opposite in fact. Postfix provides many tools to simplify problem solving and these tools need a section of their own. While implementing new features in your Postfix mail system, do it step by step. The more insecure you are in what you are doing, the smaller steps you should take. If you run into problems, you will discover them early and it will be easier to figure out what went wrong. This is especially true when implementing complex lookup tables using MySQL databases. To reiterate what was said before: if you are even slightly uncomfortable with complex lookup tables, never introduce a new feature and a complex lookup table configuration at the same time. If something breaks, you will have much more trouble figuring out where to start.

When trying out new configurations, it does not hurt to be on the defensive side until the configuration is fully tested. By setting the following feature all permanent errors will be turned into temporary errors:

soft_bounce = yes

This means that the transmission of any messages rejected by your server will be retried, and that Postfix will retry sending any messages that get rejected by a remote server. With this setting in effect, closely monitor the logs and look for rejections that do not seem normal. Do not forget to turn this feature off when your configuration is tested!

Contents

[edit] Reading and Interpreting the Log Files

One key element in troubleshooting Postfix problems is being able to read and interpret the log messages that Postfix produces. Because they are plain text files with one log message per line, they do not require any special programs for inspection. We have looked at the logs a few times before, but this section will explain the messages and give examples of both successful mail deliveries and failures. When reading the examples, refer to the figure in the Postfix Architecture: An Overview section and note how the order of the log entries closely follows the path of the mail through Postfix.

The topic of understanding Postfix's logging is also discussed in Kyle Dent's article Troubleshooting with Postfix Logs (http://www.onlamp.com/pub/a/onlamp/2004/01/22/postfix.html).

[edit] Message Queue ID

An important property of each message, received and processed, is the queue ID. The queue ID is a hexadecimal number of varying lengths that identifies a message. Log messages that have a message context will also log the queue ID. This makes it easy for you to find all log messages that pertain to a message if you have the queue ID (the path to the log file needs to be adjusted for your system):

$ grep 92AFD4302 /var/log/maillog

The queue ID is assigned when the cleanup daemon creates a queue file in one of the Postfix queue directories. The queue file remains in the system until all recipients have been delivered to or the message expires, after which the qmgr daemon removes the queue file. In recent releases of Postfix this removal event is logged, as we will see in the examples.

Sometimes you will find that there is no queue ID but instead the word NOQUEUE in the log, as in this example that we have seen before:

 Dec 31 16:39:31 jeeves postfix/smtpd[28478]: NOQUEUE: reject_warning: RCPT from unknown[222.101.15.127]: 450 Client
 host rejected: cannot find your hostname, [222.101.15.127]; from=<jdoe@example.com> to=<me@example.com> proto=SMTP 
 helo=<222.101.15.127>

The reason is that this message has not yet been given a queue file and thus has not been assigned a queue ID. The queue file is created by the cleanup daemon when the first recipient has been accepted. This is a performance optimization.

Do not confuse the queue ID with the message ID. The latter is contained in the Message-ID header of each message and is normally added by the mail client before the message is handed over to Postfix. If no such header field is present, Postfix' cleanup daemon will add one for you. The cleanup daemon will always log the message ID of received messages:

 Jan 5 23:49:13 jeeves postfix/cleanup[12547]: 92AFD4302:
 message-id=<20041214021903.243BE2D4CF@mail.example.com>

The Message-ID header contains the hostname of the computer and typically the current date and time, and it will be unique for each message. Do not fall in the trap of thinking that the queue IDs also are unique. Queue IDs can and will be reused for different messages, theoretically as often as every second (but that would have to be on an incredibly busy system).

[edit] Example: SMTP Submission, Local Delivery

Let us start by looking at two examples of successful mail transactions. The first one shows a message being received by SMTP and delivered to a local mailbox, and the second example will show a locally submitted message that is delivered to a foreign mailbox via SMTP.

Our first example shows what the logs contain after a message has been received via SMTP and delivered to a local user.

 Jan 5 23:49:13 jeeves postfix/smtpd[12546]:
 connect from mail.example.com[1.2.3.4]

The smtpd daemon has received a connection from a client.

 Jan 5 23:49:13 jeeves postfix/smtpd[12546]: 92AFD4302:
 client=mail.example.com[1.2.3.4]

Postfix has now accepted the first recipient of this message and requested a queue file from the cleanup daemon. This is the first log message for this message that contains the queue ID:

 Jan 5 23:49:13 jeeves postfix/cleanup[12547]: 92AFD4302:
 message-id=<20041214021903.243BE2D4CF@mail.example.com>

The cleanup daemon has received the whole message from the smtpd daemon and logs the message ID:

 Jan 5 23:49:13 jeeves postfix/smtpd[12546]:
 disconnect from mail.example.com[1.2.3.4]

The client disconnected from the SMTP server:

 Jan 5 23:49:13 jeeves postfix/qmgr[22431]: 92AFD4302:
 from=<joe@example.com>, size=4258, nrcpt=1 (queue active)

The message has entered the active queue and is thus eligible for delivery (unless the queue is congested, delivery will start more or less immediately). The queue manager logs the sender address, the message size in bytes, and the total number of recipients. The reported size will be slightly larger than the actual number of bytes in the message and the size of the message when stored on disk. This is because the reported size is the total size of the message content records in the queue file, and this gives a little overhead:

 Jan 5 23:49:14 jeeves postfix/local[12653]: 3C21A4305:
 to=<jack@example.net>, orig_to=<postmaster@example.net>,
 relay=local, delay=1, status=sent (delivered to maildir)

The local delivery agent successfully delivered the message to the maildir of the local user 'jack'. The message was originally addressed to postmaster@example.net, but some address rewriting mechanism (typically a local or virtual alias) rewrote the recipient address. Finally, the message was delivered about one second after it was received.

Note that this message is logged when the delivery is completed. If the delivery agent invokes another program during the delivery and that program logs messages if its own, these will end up in the log before this delivery completion message.

Each recipient delivered to will emit a log message:

Jan 5 23:49:26 jeeves postfix/qmgr[22431]: 92AFD4302: removed

All recipients have been delivered to so the queue file can be removed.

[edit] Example: Local Submission, SMTP Delivery

Our next example is somewhat the opposite if the previous example. Here, a message is submitted via the sendmail command is delivered to another host via SMTP:

 Jan 6 01:41:29 jeeves postfix/pickup[12659]:
 CBA844305: uid=100 from=<jack>

The submitted message has been taken care of by the pickup daemon. The message was submitted by the user having user ID 100, and the sender was the unqualified address, jack:

 Jan 6 01:41:30 jeeves postfix/cleanup[13190]: CBA844305:
 message-id=<20050106004129.CBA844305@example.net>

Again, the message has been read by the cleanup daemon and the message ID is logged:

 Jan 6 01:41:30 jeeves postfix/qmgr[12661]: CBA844305:
 from=<jack@example.net>, size=1309, nrcpt=1 (queue active)

Note how the previously unqualified sender address has now been rewritten to a fully qualified address, probably because the myorigin parameter is equal to example.net.

 Jan 6 01:41:31 jeeves postfix/smtp[13214]: CBA844305:
 to=<joe@example.com>, relay=mail.example.com[1.2.3.4],
 delay=2, status=sent (250 Ok: queued as DD8F02787)

The message was successfully delivered to the recipient joe@example.com via the mail.example.com SMTP relay. When accepting the message, the remote server said:

250 Ok: queued as DD8F02787

So now we know the queue ID that our message got at the other end. This information may be useful if we need to contact the postmaster at example.com regarding this message:

 Jan 6 01:41:31 jeeves postfix/qmgr[12661]: CBA844305: removed

 Delivery completed, queue file removed.

You are probably starting to get a grip on the general format of the logs emitted for a message, so the next example will only show log fragments.

[edit] Example: Connection Problems upon SMTP Delivery

The following example shows what happens when multiple hosts are set up in DNS to receive messages for a domain but some of the hosts are temporarily unreachable causing Postfix to try a few of them before the delivery can be made. We will only look at the logs of the delivery agent:

 Jan 2 14:19:46 poseidon postfix/smtp[998]: connect to
 mx4.hotmail.com[65.54.190.230]: Connection timed out (port 25)
 Jan 2 14:20:16 poseidon postfix/smtp[998]: connect to
 mx1.hotmail.com[64.4.50.50]: Connection timed out (port 25)
 Jan 2 14:20:46 poseidon postfix/smtp[998]: connect to
 mx3.hotmail.com[64.4.50.179]: Connection timed out (port 25)
 Jan 2 14:20:47 poseidon postfix/smtp[998]: 940C132ECE:
 to=<postmaster@hotmail.com>, relay=mx4.hotmail.com[65.54.167.230],
 delay=92, status=sent (250 <20050102131914.B7C4B32ECF@example.com>
 Queued mail for delivery)

Clearly, three of the receiving mail hosts for hotmail.com were unreachable when Postfix attempted the delivery. Notice how the connection attempts are evenly spread out at 30-second intervals. This is not a coincidence; the default value of the smtp_connect_timeout parameter, which controls how long Postfix will wait for a connection, is indeed 30 seconds. These three 30-second timeouts also explain why the delivery delay logged by the last message is 92 seconds. Also do notice that the acceptance message that Hotmail gives us does not contain any queue ID but instead the message ID.

[edit] Getting More Detailed Log Messages

In most cases, Postfix's default logging is enough to resolve a problem but sometimes more details are needed. For those rare cases, you can ask Postfix's daemon processes to log more detailed messages by making sure they are given at least one -v startup option. This is done by editing master.cf and adding -v to the end of the line for the daemon from which you want to get more detailed logging. For example, to get verbose logging from the SMTP server, smtpd, change the line:

smtp inet n - n - - smtpd

to this:

smtp inet n - n - - smtpd -v

Depending on your configuration, the first line may look slightly different, but the important part is what is in the last column, the name of the daemon. In the case of the SMTP server, busy servers may produce insane amounts of logging with this setting. If such is the case, the debug_peer_list parameter can come in handy.

This parameter accepts one or more hostnames or network addresses for which the level of logging will be increased. This makes sense only in contexts where there is a network peer, such as in the SMTP server and SMTP client.

If you are having problems sending messages to a particular remote server, say mail.example.com, you can set

debug_peer_list = mail.example.com

and watch the increased logging when Postfix connects to that particular host. When using debug_peer_list, there is no reason to touch master.cf.

[edit] Troubleshooting Lookup Tables with Postmap

The postmap command is not only useful for rebuilding indexed lookup tables. You can also use it to query lookup tables in order to check if the lookups work as you expect them to. This is especially useful for regular expression lookup tables and complex lookup tables' types like MySQL, LDAP, and PostgreSQL. Before taking new lookup tables into use in Postfix, you should debug them with postmap first. To perform lookups with postmap, use the -q option:

 $ postmap -q postmaster@example.com mysql:/etc/postfix/mysql-aliases.cf
 jack@example.com
 

You can also examine the exit status of the command to determine whether the lookup succeeded. As always, a zero exit status indicates success. The UNIX shell stores the exit status of the last process in the $? environment variable.

If a lookup does not work as you expect, you can (just as with the Postfix daemons) use one or more -v startup options to increase the verbosity of the messages.

Note that postmap performs 'raw' queries. For example, if you want to know whether the IP address 1.2.3.4 is matched by the following access map line:

1.2.3 REJECT

You cannot test it with the following command:

$ postmap q 1.2.3.4 hash:/etc/postfix/client_access

The postmap command does not know about Postfix's rules for how IP addresses are matched in access map context, and even if it did it has no way of knowing that 1.2.3.4 is an IP address.

[edit] Getting Help

The mailing list for Postfix is a very valuable resource when one is stuck with a Postfix problem. Links to the archives of the list as well as instructions for how to subscribe can found at http://www.postfix.org/lists.html.

Although the people on the list are very helpful, they do expect you to do your homework before requesting help. This means that you should search the list archives to see if your question has been asked before, and most importantly, you should read the documentation first.

When asking a question, do not forget to state the bigger goal you are trying to achieve. This is often forgotten, and the question is just too specific. Not only will an understanding of the bigger picture make it easier to help you, but it will also reveal if the solution method you have chosen is completely wrong. However, do not be too verbose in your description! After all, the people reading the Postfix-users list are humans too, and they do get bored with over-long postings.

Because they are humans, they are also not psychic. Therefore, be sure to provide complete configuration and any log messages that may be relevant to your question. Obtain your configuration by running postconf -n. That command will print the values of all parameters that your have set in your main.cf. Do not post the complete contents of your main.cf, or the output of postconf (without the -n). The contents of the master.cf are rarely needed.

[edit] Additional References

For instructions on Installing Postfix, click here.

[edit] Source

The source of this content is Chapter 2: Setting Up Postfix of [1] by Patrick Ben Koetter, Carl Taylor, Magnus Back, Ralf Hildebrandt, David Rusenko, Alistair McDonald (Packt Publishing, 2005).

Personal tools