Trouble Shooting Wiki

Monitoring Active Directory

From TroubleshootingWiki

Jump to: navigation, search
Active Directory
Official Page
Project Documentation
Download


Source Book
200px-1847193277.jpg
ISBN 978-1-847191-44-1
Publisher Packt Publishing
Author(s) Florian Rommel

Contents

Diagnosing and Troubleshooting Tools

In this section we are going through some of the more advanced uses and possibilities of three tools that will undoubtedly be of great help in diagnosing either AD problems or replication problems.

The three tools that almost every person who needs to perform troubleshooting on Windows Server 2003 should be familiar with are NetDiag.exe, DcDiag.exe, and Repadmin.exe.

All three are command-line utilities, which implies that they can be run from a Windows XP Professional or Windows Vista workstation, and that they are standalone executables that can be copied from one machine to another. A nice feature of these tools is that they connect remotely to the DCs via the Remote Procedure Call service (RPC) and therefore you do not need to log on to the DCs. The only requirement for most of them is that they need to run from a machine that is a member of the domain and has access to the network.

DcDiag

DcDiag is the domain controller diagnostic utility. It allows you to diagnose a domain controller, and check if everything is ok. If it is not, then the tests will run until failure, and based on which tests fail, you can go about finding the cause. Even though this may sound rather simple, this utility, even if run without any flags or options, will execute a set of very meaningful tests, and based on pass or fail, you will have a very good idea of where to start looking. The tests performed are listed in the following tables:

Primary tests Descriptions
Connectivity Tests whether or not the DC is connected to the network
Replications Tests to ensure replications can be started, and are run

on time

NCSecDesc Checks that the security descriptors on the naming context heads have appropriate permissions for replication
NetLogons Tests if the DC allows the appropriate logons to initiate replication
Advertising Tests if the DC can advertise itself (DNS)
KnowsOfRoleHolders Tests if the DC knows which servers hold the FSMO roles
RidManager Tests if the DC can contact the RidManager
Machine Account Tests if the DC has a valid Machine Account
Services Tests if the necessary services are running on the DC
ObjectsReplicated Tests whether or not the DSA and Machine Account objects have ever been replicated
frssysvol Checks if the sysvol share is listed in the File Replication Service (FRS)
frsevent Tests if FRS errors have been generated
kccevent Tests if KCC errors have been generated
systemlog Tests if there are system errors
VerifyReferences Tests if AD FRS records are intact and correct for the replication infrastructure


DNS Partition tests Descriptions
Forest DNS tests
CrossRefValidation Checks if the replication cross-references are intact in the forest DNS zone
CheckSDRefDom Checks if the Security descriptors for the forest are intact
Domain DNS tests
CrossRefValidation Checks if the replication cross-references are intact in the domain DNS zone
CheckSDRefDom


Checks if the Security descriptors for the domain are intact


Schema, Configuration, and Enterprise tests Descriptions
Schema tests
CrossRefValidation Checks if the replication cross-references are intact in the schema itself
CheckSDRefDom Checks if the Security descriptors within the schema are intact
Configuration
CrossRefValidation Checks if the replication cross-references are intact in the current forest configuration
CheckSDRefDom Checks if the Security descriptors within the configuration are intact
Partition tests
CrossRefValidation Checks if the replication cross-references are intact in the current application partition
CheckSDRefDom Checks if the Security descriptors for the application partition are intact
Enterprise tests
Intersite Checks if the inter-site replication can be initiated
FSMOCheck Checks that all FSMO roles are assigned and can be contacted

As you can see, DCDiag performs a lot of tests. But bear in mind that most of these are run locally on the DC with the data held by that DC. This means that any error in the DNS, Schema, configuration or partition tests can be working on another DC. It is possible that only this particular replica of the DC is malfunctioning.

There are a few other helpful, additional tests that are not in the default set. These are important to know as well. To run the specific tests, simply type the following at the command prompt:

>dcdiag /test:TESTNAME

Where testname is the name of the test. You can see a list of these tests in the following table. The most notable tests are those in the DNS set. As the DNS is such an integral part of AD, testing its functionality will rule out, or at least narrow down, several things that could be wrong with the DNS.

Additional tests Descriptions
DNS Tests DNS checks for the entire enterprise; subtests can be checked separately
DNS/DnsBasic Tests the basic DNS functionality such as connecting and looking up records
DNS/DnsForwarders Checks the forwarders and root hints for errors
DNS/DnsDelegation Checks the DNS delegations
DNS/DnsDynamicUpdate Checks if dynamic updates are working
DNS/DnsRecordRegistration Checks if the DNS registration works
DNS/DnsResolveExtName Checks if external names can be resolved
DNS/DnsAll Runs all the subtests
DNS/DnsInternetName: Can be used with /DnsResolveExtName with a URL to resolve
CheckSecurityError Checks for security errors or potential security errors
VerifyReplicas Checks all replicas on all replica servers for consistency
Topology Checks whether the entire topology is fully connected
CutoffServers Checks for servers whose partners are not available, and therefore can't receive replications

Running DcDiag with the /fix flag will attempt to fix some minor problems it encounters. Some of the DNS-related issues, for example when the DC is not registered in the application partition, could quickly be fixed this way.

NetDiag

NetDiag is another command-line utility that lets you perform tests with a lot of verbose output. It is also included in the Windows Support tools. While DcDiag allowed to you to test everything related to the DCs and DNS, NetDiag allows you to test everything related to the network stack of the machine.

NetDiag, by default, just like its cousin DcDiag, runs an extensive set of tests. These tests include checking network connectivity, checking which hotfixes are installed, whether the network card is configured properly and the network speed is configured correctly, what protocols and services are running, domain membership, and many more. A sample output from a working DC (DC1.nailcorp.com) is as follows:

C:\Documents and Settings\Administrator>netdiag

....................................

    Computer Name: DC1

    DNS Host Name: dc1.nailcorp.com

    System info : Microsoft Windows Server 2003 (Build 3790)

    Processor : x86 Family 6 Model 15 Stepping 8, GenuineIntel

    List of installed hotfixes :

        KB921503

        KB924667-v2

        KB925398_WMP64

        KB925902

        KB926122

        KB927891

        KB929123

        KB930178

        KB931784

        KB932168

        KB933360

        KB933729

        KB933854

        KB935839

        KB935840

        KB935966

        KB936021

        KB936357

        KB936782

        KB938127

        KB939653

        KB941202

        KB941568

        KB941569

        KB941644

        KB941672

        KB942615

        KB942763

        KB942840

        KB943460

        KB943485

        KB944653

        Q147222

 

 

Netcard queries test . . . . . . . : Passed

 

Per interface results:

    Adapter : Local Area Connection

        Netcard queries test . . . : Passed

 

        Host Name. . . . . . . . . : dc1

        IP Address . . . . . . . . : 10.0.0.50

        Subnet Mask. . . . . . . . : 255.255.255.0

        Default Gateway. . . . . . : 10.0.0.2

        Dns Servers. . . . . . . . : 10.0.0.50

        AutoConfiguration results. . . . . . : Passed

        Default gateway test . . . : Passed

        NetBT name test. . . . . . : Passed

        [WARNING] At least one of the <00> 'WorkStation Service', <03> 'Messenger Service', <20> 'WINS' names is missing.

 

WINS service test. . . . . : Skipped

There are no WINS servers configured for this interface.

 

Global results:

Domain membership test . . . . . . : Passed

NetBT transports test. . . . . . . : Passed

    List of NetBt transports currently configured:

        NetBT_Tcpip_{D8B5C232-8078-485D-8DE0-2F5C8C2FB480}

    1 NetBt transport currently configured.

 

Autonet address test . . . . . . . : Passed

IP loopback ping test. . . . . . . : Passed

 

Default gateway test . . . . . . . : Passed

NetBT name test. . . . . . . . . . : Passed

    [WARNING] You don't have a single interface with the <00> 'WorkStation Service', <03> 'Messenger Service', <20> 'WINS' names defined.

 

Winsock test . . . . . . . . . . . : Passed

DNS test . . . . . . . . . . . . . : Passed

    PASS - All the DNS entries for DC are registered on DNS server '192.168.0.50' and other DCs also have some of the names registered.

 

Redir and Browser test . . . . . . : Passed

    List of NetBt transports currently bound to the Redir

        NetBT_Tcpip_{D8B5C232-8078-485D-8DE0-2F5C8C2FB480}

    The redir is bound to 1 NetBt transport.

    List of NetBt transports currently bound to the browser

        NetBT_Tcpip_{D8B5C232-8078-485D-8DE0-2F5C8C2FB480}

    The browser is bound to 1 NetBt transport.

DC discovery test. . . . . . . . . : Passed

DC list test . . . . . . . . . . . : Passed

Trust relationship test. . . . . . : Skipped

Kerberos test. . . . . . . . . . . : Passed

LDAP test. . . . . . . . . . . . . : Passed

Bindings test. . . . . . . . . . . : Passed

WAN configuration test . . . . . . : Skipped

    No active remote access connections.

Modem diagnostics test . . . . . . : Passed

IP Security test . . . . . . . . . : Skipped

    Note: run "netsh ipsec dynamic show /?" for more detailed information

 

The command completed successfully

As you can see from this example output, there are several warnings and skips of tests. The warnings shown here occur because we do not have certain services running on the DCs. These are not critical services and we do not need them. However, NetDiag tested them anyway because, by default, it executes a standard set of tests which includes these. The Messenger Service, Workstation Service, and WINS names were not running because they are not needed in our domain structure.

Should you rely on these services, especially as many companies still have WINS servers running, these warnings should catch your attention because they should be running. As the WINS server address was not defined, the service test for WINS was skipped. We also have no current trust relationship with other domains, and therefore that test was also skipped. If we had one, the trust connection would have been checked to verify that it was working and the other end can be contacted. Lastly, we do not have any IPSec configured on our network cards, and no WAN or Remote Access connections, so this is why these tests were skipped as well. If you have a RAS connection configured, the test will try to use it and verify its state. This is particularly useful if you have modem or DSL-based backup lines that are used for replication should the LAN fail between the networks and DCs.

Just like DcDiag, NetDiag also has a set of extra switches and tests. The interesting ones are listed in the following table. And you can invoke the tests by typing:

>netdiag /OPTION /TESTNAME

The options in NetDiag are the ones that can give a lot of information on the verbose and debug flags.

Netdiag switches Descriptions
/q This creates a quiet mode and only shows the errors and warnings encountered, if any
/v Creates a verbose output for more information about each test result
/debug Is the most verbose output; NetDiag can take quite a while to complete
/l Creates a log file called netdiag.log in the same directory as executed
/d:NAME NAME represents the domain name, and this option will find a DC in that domain
/fix Like dcdiags /fix, this option fixes minor problems quickly
/DcAccountEnum Enumerates all of the DC computer accounts within a domain
/test:Name Name here is the name of the single test to run; for a full list, type netdiag /?

If you have connectivity issues, NetDiag will almost certainly find them, and as with DcDiag, it is recommended that you always output everything to a log file, which is easier to read. The output of both utilities scrolls by rather quickly in the command line, and can be difficult to read.

Monitoring with Sonar and Ultrasound

Monitoring your AD is something that needs to be done regularly, and there are many commercial utilities out there that will help you achieve this. However, it might be worth investigating tools that are available for free from Microsoft, and even from some other vendors.

Introducing Sonar

Sonar and Ultrasound are two utilities that allow you to monitor the File Replication Service (FRS), and both utilities are good at detecting problems beforehand, or issues with replication from certain DCs. Sonar can be downloaded from the Microsoft Download Center at http://www.microsoft.com/downloads.

You will need to have the .Net Framework 1.1 installed on the machine where Sonar will run. Also, please be aware that if you have .Net Framework 2.0 installed, it does not include 1.1, and you need to install 1.1 as well.

Once installed, Sonar will not create a program menu entry, so you will need to search for it. For some reason, it will install itself into the Resource Kit folder (C:\Program Files\Resource Kit\) and it is called Sonar.exe. Once you run it, you will be presented with the following dialog box:

At this point, you can see two buttons, which can be used either for default querying (that is, all of the DCs within your domain) or for loading the settings with the Load Query button, if you have a specific query or setup saved. In our example, we will view the results and you will see the screen that you have seen in the previous figure. Also note the drop-down for Replica Set. This allows you to monitor DFS replications within your domain. So this tool is not just used to monitor the SYSVOL replications.

From the top part, you can easily select a very wide range of Filters via a drop-down list, and the Columns can be used to select the columns to be displayed. This relates to a group of columns, so there are more columns than just the ones selected from the drop-down. To illustrate the extent of information that you can get with this little utility, the following screenshot shows both of the menus expanded.

As you can see, you can use this tool to find out any information regarding the replication. Once you select the filters and columns that you want, you can click Refresh All and it will fetch that information from all DCs within your domain. You can see the disk usage of the AD database on all different DCs including any DC that has low disk space, is too slow, is backlogged with AD replications, and so on. This small utility, when used periodically, will help you to keep your AD in good healthy, shape and might help you find trouble-spots such as low bandwidth or wrongly configured replication schedules.

Introducing Ultrasound

Although Sonar is a good utility that is small and does its job very well, some organizations either have many FRS points that they want to monitor, or want much more information.

This is where Ultrasound comes in. This utility is also a free download from Microsoft. However, it has much steeper requirements. Namely, it requires an SQL server as a backend. Even the SQL Server 2000 Desktop engine, or the free SQL Server 2005 Express Edition, downloadable from the Microsoft Download Center, will serve this purpose, but they would require a two-step setup and more resources. It also does collections periodically via agents that are deployed using WMI from within the Ultrasound interface. Although the free Desktop Engine has limitations, such as allowing only few connections, it does provide enough database functionality for Ultrasound. SQL Server 2005 Express edition will work perfectly fine with no problems.

If Sonar can be compared to a sonar on a boat, which gives you a lot of information about what's ahead and what's going on around you, then Ultrasound has all of the features of Sonar, plus an additional feature for radar and satellite surveillance. Getting familiar with Ultrasound may take some time. As Ultrasound is a Microsoft utility, it can be downloaded from the Microsoft Download Center.

Once you install the SQL server, or prepare a database on an existing server, you can proceed to installing Ultrasound. You will be asked which server to use and you can just enter the name of the PC where your SQL server is running. After deploying the database structure, which can take a few minutes, the installation will finish, and you will have a new program menu entry, called FRS Monitoring, where Ultrasound is located.

Once you launch Ultrasound for the first time, you will be asked to add an FRS replica to Ultrasound. At this point, you should click Yes and you will be prompted for your domain name and the available FRS replicas. In our case, this is similar to the example shown in the following screenshot. By simply clicking the replica set, and then clicking on Add, you can add it to the list of FRS replicas to the list of FRS replicas to be monitored.

Next, you click OK, and Ultrasound will collect the Schema data from the selected replica set, and then ask you to add all Servers found, add only the highly connected, hub, servers or add none, and you will select your own. There is also an option to install the WMI collectors, which you want to do (shown in the following screenshot).


Once you have selected your approach, a whole world of information will open up. The tool may appear confusing simply because of the volume of information you can gather with it, but the learning curve quickly flattens, and the data that it provides becomes invaluable. After the initial WMI collector deployment is done, you can close the screen. Henceforth you will find that the screen shown in the following screenshot is always displayed when you start Ultrasound:

At first, you are given a health rating, which is generally accurate as only critical errors, or errors that could cause problems, change this rating. You can expand the replica set and see each server's health rating as well. This allows you to quickly identify any critical issues with the DCs.

Details

On the second tab, Details, you will find information about the replications of the servers you have selected. We selected only DC1, DC2, and DC30, and details of the ongoing replications and which DCs have the most inbound and outbound connections are displayed, as shown in the following screenshot. On the top, you can also change the details to be displayed, for example the files contained within this Replica Set that are replicated.

Right-clicking on a server opens up a context menu that either allows you to collect data from a specific server, or opens up the replica set and displays the details of the replica set for the server, depending on the context.

Right-clicking on the inbound or outbound connection windows will allow you to collect data, or see details regarding a specific inbound, outbound, or replica member.

Alert History

The Alert History tab (shown in the following screenshot) contains all of the alerts caused by various actions or errors in the monitoring process, including failed WMI deployments, morphed directories, and other events. This is the power of Ultrasound. The detail-each error message contains is very surprising.

You simply double-click on an alert and the general view with all its information is displayed. This information contains the usual things, such as the date and time when the event occurred, a description of the problem, and so on. It also allows you to assign the error to a support person, and change the status from active to resolved and specify the urgency of the problem. But it the general view also has an Advanced tab where a lot more information regarding the error, such as what the actual error was, which server caused it, and so on, are shown. The following screenshot shows both tabs side-by-side:

Summary and Advanced Tabs

The Summary tab provides a full summary of your AD replications. It shows everything from every member, with the domain listed at the top. The domain view shows the number of files that are backlogged, the number of servers that have yellow connections, (that is, unhealthy ones) the servers that have a high connection count, and active notifications regarding the servers that are selected. All of these are illustrated in the following screenshot.

The Advanced tab extends the Summary tab, and all of the other ones. It allows you to query any information in the Ultrasound database. On the normal view, you can select pre-configured general view collections of your replica set, in the left hand pane. There are more views, such as Failed AD updates, than in any of the previous screens, although it is possible to easily create custom filters.

To create a custom filter for a view, which you can even configure to email you in case of a certain event happening, simply select the view and click on the Row Filter drop-down selection, and then click the "…" button, or leave it at NO FILTER and click the "..." button. You will be presented with a window that allows you to either change a filter by selecting it and then clicking on change, or to create a completely new one. In our case, we will edit the AD Collection error filter in the Failed AD updates view. Simply click on the second row with the Error 301 column, and click on Change.

We will change this filter to:

1. Notify us by email if a collection error occurs and

2. Set the health metrics for this filter to critical, as it then raises red flags immediately in the event of an AD collection failure.

This might seem a bit drastic as a collection failure can occur for a number of reasons, but unless these reasons occur a lot in your infrastructure, this should be a good way of identifying anomalies.

First, in the Change window, click on the Alert tab and select Enable notifications. Then, select the Custom notifications option. Finally, simply click on ADD on the right-hand side of the dialog box, and enter the email address to which you want the notification to be sent. You can only add one email address per notification, so, you have to add each email address separately. However,you can also log an event, even though you are receiving an email (as shown in the following screenshot).

To set the health metrics to critical, first click on the Health Metrics tab, and click on Enable health metrics. Then, simply click on ADD, leave Replica Set selected, and select Critical from the bottom drop-down menu (as shown in the following screenshot). Finally, simply click on OK and you will be returned to Ultrasound.

At this point, you could just minimize Ultrasound. The WMI collectors will continuously feed it data, and the AD collection alert will notify you for an AD replication collection failure. If you installed Ultrasound with a standard installation of SQL Server then you can close the program and the WMI collectors will continue to feed data straight into the database. If you have Ultrasound installed with a Desktop Engine, or SQL Server 2005 Express, you should have the application running continuously. You can, of course, configure many more notifications to make sure that you cover all your bases, and do not have to spend time continually watching Ultrasound.

Ultrasound is a utility that has a somewhat steep learning curve for a short time, but can help you keep a perfectly healthy, replicating AD, when deployed correctly and used well.

Summary

In this tutorial we discussed a few tools and utilities that will help you monitor and diagnose your AD. Although these might not be directly-related to disaster recovery, it is always good to have such important information at hand, as this can then allow you to find a problem before it becomes too widespread.

Also, small command line utilities such as DcDiag and NetDiag, together with the whole set of tools in the Resource Kit and the Support tools, are invaluable to have in the DCs, or at least on an administrative machine where they are available for use at any time. The output of these smaller utilities can be faster than sifting through event logs that also contain a lot of other things. Lastly, having tools such as Ultrasound deployed is useful. But if you have no processes defined for how and how often to monitor them, or the corrective course of action in to take case of a problem, its value decreases significantly.

Source

The source of this content is Chapter 10:Common Recovery Tools Explained of Active Directory Disaster Recovery by Florian Rommel Packt Publishing, 2008).

Personal tools