Monitoring Active Directory
From TroubleshootingWiki
| Official Page |
| Project Documentation |
| Download |
|
Contents |
[edit] Diagnosing and Troubleshooting Tools
In this section we are going through some of the more advanced uses and possibilities of three tools that will undoubtedly be of great help in diagnosing either AD problems or replication problems.
The three tools that almost every person who needs to perform troubleshooting on Windows Server 2003 should be familiar with are NetDiag.exe, DcDiag.exe, and Repadmin.exe.
All three are command-line utilities, which implies that they can be run from a Windows XP Professional or Windows Vista workstation, and that they are standalone executables that can be copied from one machine to another. A nice feature of these tools is that they connect remotely to the DCs via the Remote Procedure Call service (RPC) and therefore you do not need to log on to the DCs. The only requirement for most of them is that they need to run from a machine that is a member of the domain and has access to the network.
[edit] DcDiag
DcDiag is the domain controller diagnostic utility. It allows you to diagnose a domain controller, and check if everything is ok. If it is not, then the tests will run until failure, and based on which tests fail, you can go about finding the cause. Even though this may sound rather simple, this utility, even if run without any flags or options, will execute a set of very meaningful tests, and based on pass or fail, you will have a very good idea of where to start looking. The tests performed are listed in the following tables:
| Primary tests | Descriptions |
|---|---|
| Connectivity | Tests whether or not the DC is connected to the network |
| Replications | Tests to ensure replications can be started, and are run
on time |
| NCSecDesc | Checks that the security descriptors on the naming context heads have appropriate permissions for replication |
| NetLogons | Tests if the DC allows the appropriate logons to initiate replication |
| Advertising | Tests if the DC can advertise itself (DNS) |
| KnowsOfRoleHolders | Tests if the DC knows which servers hold the FSMO roles |
| RidManager | Tests if the DC can contact the RidManager |
| Machine Account | Tests if the DC has a valid Machine Account |
| Services | Tests if the necessary services are running on the DC |
| ObjectsReplicated | Tests whether or not the DSA and Machine Account objects have ever been replicated |
| frssysvol | Checks if the sysvol share is listed in the File Replication Service (FRS) |
| frsevent | Tests if FRS errors have been generated |
| kccevent | Tests if KCC errors have been generated |
| systemlog | Tests if there are system errors |
| VerifyReferences | Tests if AD FRS records are intact and correct for the replication infrastructure |
| DNS Partition tests | Descriptions |
|---|---|
| Forest DNS tests | |
| CrossRefValidation | Checks if the replication cross-references are intact in the forest DNS zone |
| CheckSDRefDom | Checks if the Security descriptors for the forest are intact |
| Domain DNS tests | |
| CrossRefValidation | Checks if the replication cross-references are intact in the domain DNS zone |
| CheckSDRefDom
| Checks if the Security descriptors for the domain are intact |
| Schema, Configuration, and Enterprise tests | Descriptions |
|---|---|
| Schema tests | |
| CrossRefValidation | Checks if the replication cross-references are intact in the schema itself |
| CheckSDRefDom | Checks if the Security descriptors within the schema are intact |
| Configuration | |
| CrossRefValidation | Checks if the replication cross-references are intact in the current forest configuration |
| CheckSDRefDom | Checks if the Security descriptors within the configuration are intact |
| Partition tests | |
| CrossRefValidation | Checks if the replication cross-references are intact in the current application partition |
| CheckSDRefDom | Checks if the Security descriptors for the application partition are intact |
| Enterprise tests | |
| Intersite | Checks if the inter-site replication can be initiated |
| FSMOCheck | Checks that all FSMO roles are assigned and can be contacted |
As you can see, DCDiag performs a lot of tests. But bear in mind that most of these are run locally on the DC with the data held by that DC. This means that any error in the DNS, Schema, configuration or partition tests can be working on another DC. It is possible that only this particular replica of the DC is malfunctioning.
There are a few other helpful, additional tests that are not in the default set. These are important to know as well. To run the specific tests, simply type the following at the command prompt:
>dcdiag /test:TESTNAME
Where testname is the name of the test. You can see a list of these tests in the following table. The most notable tests are those in the DNS set. As the DNS is such an integral part of AD, testing its functionality will rule out, or at least narrow down, several things that could be wrong with the DNS.
| Additional tests | Descriptions |
|---|---|
| DNS | Tests DNS checks for the entire enterprise; subtests can be checked separately |
| DNS/DnsBasic | Tests the basic DNS functionality such as connecting and looking up records |
| DNS/DnsForwarders | Checks the forwarders and root hints for errors |
| DNS/DnsDelegation | Checks the DNS delegations |
| DNS/DnsDynamicUpdate | Checks if dynamic updates are working |
| DNS/DnsRecordRegistration | Checks if the DNS registration works |
| DNS/DnsResolveExtName | Checks if external names can be resolved |
| DNS/DnsAll | Runs all the subtests |
| DNS/DnsInternetName: | Can be used with /DnsResolveExtName with a URL to resolve |
| CheckSecurityError | Checks for security errors or potential security errors |
| VerifyReplicas | Checks all replicas on all replica servers for consistency |
| Topology | Checks whether the entire topology is fully connected |
| CutoffServers | Checks for servers whose partners are not available, and therefore can't receive replications |
Running DcDiag with the /fix flag will attempt to fix some minor problems it encounters. Some of the DNS-related issues, for example when the DC is not registered in the application partition, could quickly be fixed this way.
[edit] NetDiag
NetDiag is another command-line utility that lets you perform tests with a lot of verbose output. It is also included in the Windows Support tools. While DcDiag allowed to you to test everything related to the DCs and DNS, NetDiag allows you to test everything related to the network stack of the machine.
NetDiag, by default, just like its cousin DcDiag, runs an extensive set of tests. These tests include checking network connectivity, checking which hotfixes are installed, whether the network card is configured properly and the network speed is configured correctly, what protocols and services are running, domain membership, and many more. A sample output from a working DC (DC1.nailcorp.com) is as follows:
C:\Documents and Settings\Administrator>netdiag
....................................
Computer Name: DC1
DNS Host Name: dc1.nailcorp.com
System info : Microsoft Windows Server 2003 (Build 3790)
Processor : x86 Family 6 Model 15 Stepping 8, GenuineIntel
List of installed hotfixes :
KB921503
KB924667-v2
KB925398_WMP64
KB925902
KB926122
KB927891
KB929123
KB930178
KB931784
KB932168
KB933360
KB933729
KB933854
KB935839
KB935840
KB935966
KB936021
KB936357
KB936782
KB938127
KB939653
KB941202
KB941568
KB941569
KB941644
KB941672
KB942615
KB942763
KB942840
KB943460
KB943485
KB944653
Q147222
Netcard queries test . . . . . . . : Passed
Per interface results:
Adapter : Local Area Connection
Netcard queries test . . . : Passed
Host Name. . . . . . . . . : dc1
IP Address . . . . . . . . : 10.0.0.50
Subnet Mask. . . . . . . . : 255.255.255.0
Default Gateway. . . . . . : 10.0.0.2
Dns Servers. . . . . . . . : 10.0.0.50
AutoConfiguration results. . . . . . : Passed
Default gateway test . . . : Passed
NetBT name test. . . . . . : Passed
[WARNING] At least one of the <00> 'WorkStation Service', <03> 'Messenger Service', <20> 'WINS' names is missing.
WINS service test. . . . . : Skipped
There are no WINS servers configured for this interface.
Global results:
Domain membership test . . . . . . : Passed
NetBT transports test. . . . . . . : Passed
List of NetBt transports currently configured:
NetBT_Tcpip_{D8B5C232-8078-485D-8DE0-2F5C8C2FB480}
1 NetBt transport currently configured.
Autonet address test . . . . . . . : Passed
IP loopback ping test. . . . . . . : Passed
Default gateway test . . . . . . . : Passed
NetBT name test. . . . . . . . . . : Passed
[WARNING] You don't have a single interface with the <00> 'WorkStation Service', <03> 'Messenger Service', <20> 'WINS' names defined.
Winsock test . . . . . . . . . . . : Passed
DNS test . . . . . . . . . . . . . : Passed
PASS - All the DNS entries for DC are registered on DNS server '192.168.0.50' and other DCs also have some of the names registered.
Redir and Browser test . . . . . . : Passed
List of NetBt transports currently bound to the Redir
NetBT_Tcpip_{D8B5C232-8078-485D-8DE0-2F5C8C2FB480}
The redir is bound to 1 NetBt transport.
List of NetBt transports currently bound to the browser
NetBT_Tcpip_{D8B5C232-8078-485D-8DE0-2F5C8C2FB480}
The browser is bound to 1 NetBt transport.
DC discovery test. . . . . . . . . : Passed
DC list test . . . . . . . . . . . : Passed
Trust relationship test. . . . . . : Skipped
Kerberos test. . . . . . . . . . . : Passed
LDAP test. . . . . . . . . . . . . : Passed
Bindings test. . . . . . . . . . . : Passed
WAN configuration test . . . . . . : Skipped
No active remote access connections.
Modem diagnostics test . . . . . . : Passed
IP Security test . . . . . . . . . : Skipped
Note: run "netsh ipsec dynamic show /?" for more detailed information
The command completed successfully
As you can see from this example output, there are several warnings and skips of tests. The warnings shown here occur because we do not have certain services running on the DCs. These are not critical services and we do not need them. However, NetDiag tested them anyway because, by default, it executes a standard set of tests which includes these. The Messenger Service, Workstation Service, and WINS names were not running because they are not needed in our domain structure.
Should you rely on these services, especially as many companies still have WINS servers running, these warnings should catch your attention because they should be running. As the WINS server address was not defined, the service test for WINS was skipped. We also have no current trust relationship with other domains, and therefore that test was also skipped. If we had one, the trust connection would have been checked to verify that it was working and the other end can be contacted. Lastly, we do not have any IPSec configured on our network cards, and no WAN or Remote Access connections, so this is why these tests were skipped as well. If you have a RAS connection configured, the test will try to use it and verify its state. This is particularly useful if you have modem or DSL-based backup lines that are used for replication should the LAN fail between the networks and DCs.
Just like DcDiag, NetDiag also has a set of extra switches and tests. The interesting ones are listed in the following table. And you can invoke the tests by typing:
>netdiag /OPTION /TESTNAME
The options in NetDiag are the ones that can give a lot of information on the verbose and debug flags.
| Netdiag switches | Descriptions |
|---|---|
| /q | This creates a quiet mode and only shows the errors and warnings encountered, if any |
| /v | Creates a verbose output for more information about each test result |
| /debug | Is the most verbose output; NetDiag can take quite a while to complete |
| /l | Creates a log file called netdiag.log in the same directory as executed |
| /d:NAME | NAME represents the domain name, and this option will find a DC in that domain |
| /fix | Like dcdiags /fix, this option fixes minor problems quickly |
| /DcAccountEnum | Enumerates all of the DC computer accounts within a domain |
| /test:Name | Name here is the name of the single test to run; for a full list, type netdiag /? |
If you have connectivity issues, NetDiag will almost certainly find them, and as with DcDiag, it is recommended that you always output everything to a log file, which is easier to read. The output of both utilities scrolls by rather quickly in the command line, and can be difficult to read.
[edit] Monitoring with Sonar and Ultrasound
Monitoring your AD is something that needs to be done regularly, and there are many commercial utilities out there that will help you achieve this. However, it might be worth investigating tools that are available for free from Microsoft, and even from some other vendors.
[edit] Introducing Sonar
Sonar and Ultrasound are two utilities that allow you to monitor the File Replication Service (FRS), and both utilities are good at detecting problems beforehand, or issues with replication from certain DCs. Sonar can be downloaded from the Microsoft Download Center at http://www.microsoft.com/downloads.
You will need to have the .Net Framework 1.1 installed on the machine where Sonar will run. Also, please be aware that if you have .Net Framework 2.0 installed, it does not include 1.1, and you need to install 1.1 as well.
Once installed, Sonar will not create a program menu entry, so you will need to search for it. For some reason, it will install itself into the Resource Kit folder (C:\Program Files\Resource Kit\) and it is called Sonar.exe. Once you run it, you will be presented with the following dialog box:
At this point, you can see two buttons, which can be used either for default querying (that is, all of the DCs within your domain) or for loading the settings with the Load Query button, if you have a specific query or setup saved. In our example, we will view the results and you will see the screen that you have seen in the previous figure. Also note the drop-down for Replica Set. This allows you to monitor DFS replications within your domain. So this tool is not just used to monitor the SYSVOL replications.
From the top part, you can easily select a very wide range of Filters via a drop-down list, and the Columns can be used to select the columns to be displayed. This relates to a group of columns, so there are more columns than just the ones selected from the drop-down. To illustrate the extent of information that you can get with this little utility, the following screenshot shows both of the menus expanded.
As you can see, you can use this tool to find out any information regarding the replication. Once you select the filters and columns that you want, you can click Refresh All and it will fetch that information from all DCs within your domain. You can see the disk usage of the AD database on all different DCs including any DC that has low disk space, is too slow, is backlogged with AD replications, and so on. This small utility, when used periodically, will help you to keep your AD in good healthy, shape and might help you find trouble-spots such as low bandwidth or wrongly configured replication schedules.
[edit] Introducing Ultrasound
Although Sonar is a good utility that is small and does its job very well, some organizations either have many FRS points that they want to monitor, or want much more information.
This is where Ultrasound comes in. This utility is also a free download from Microsoft. However, it has much steeper requirements. Namely, it requires an SQL server as a backend. Even the SQL Server 2000 Desktop engine, or the free SQL Server 2005 Express Edition, downloadable from the Microsoft Download Center, will serve this purpose, but they would require a two-step setup and more resources. It also does collections periodically via agents that are deployed using WMI from within the Ultrasound interface. Although the free Desktop Engine has limitations, such as allowing only few connections, it does provide enough database functionality for Ultrasound. SQL Server 2005 Express edition will work perfectly fine with no problems.
If Sonar can be compared to a sonar on a boat, which gives you a lot of information about what's ahead and what's going on around you, then Ultrasound has all of the features of Sonar, plus an additional feature for radar and satellite surveillance. Getting familiar with Ultrasound may take some time. As Ultrasound is a Microsoft utility, it can be downloaded from the Microsoft Download Center.
Once you install the SQL server, or prepare a database on an existing server, you can proceed to installing Ultrasound. You will be asked which server to use and you can just enter the name of the PC where your SQL server is running. After deploying the database structure, which can take a few minutes, the installation will finish, and you will have a new program menu entry, called FRS Monitoring, where Ultrasound is located.
Once you launch Ultrasound for the first time, you will be asked to add an FRS replica to Ultrasound. At this point, you should click Yes and you will be prompted for your domain name and the available FRS replicas. In our case, this is similar to the example shown in the following screenshot. By simply clicking the replica set, and then clicking on Add, you can add it to the list of FRS replicas to the list of FRS replicas to be monitored.
Next, you click OK, and Ultrasound will collect the Schema data from the selected replica set, and then ask you to add all Servers found, add only the highly connected, hub, servers or add none, and you will select your own. There is also an option to install the WMI collectors, which you want to do (shown in the following screenshot).
Once you have selected your approach, a whole world of information will open up. The tool may appear confusing simply because of the volume of information you can gather with it, but the learning curve quickly flattens, and the data that it provides becomes invaluable. After the initial WMI collector deployment is done, you can close the screen. Henceforth you will find that the screen shown in the following screenshot is always displayed when you start Ultrasound:
At first, you are given a health rating, which is generally accurate as only critical errors, or errors that could cause problems, change this rating. You can expand the replica set and see each server's health rating as well. This allows you to quickly identify any critical issues with the DCs.
[edit] Details
On the second tab, Details, you will find information about the replications of the servers you have selected. We selected only DC1, DC2, and DC30, and details of the ongoing replications and which DCs have the most inbound and outbound connections are displayed, as shown in the following screenshot. On the top, you can also change the details to be displayed, for example the files contained within this Replica Set that are replicated.
Right-clicking on a server opens up a context menu that either allows you to collect data from a specific server, or opens up the replica set and displays the details of the replica set for the server, depending on the context.
Right-clicking on the inbound or outbound connection windows will allow you to collect data, or see details regarding a specific inbound, outbound, or replica member.
[edit] Alert History
The Alert History tab (shown in the following screenshot) contains all of the alerts caused by various actions or errors in the monitoring process, including failed WMI deployments, morphed directories, and other events. This is the power of Ultrasound. The detail-each error message contains is very surprising.
You simply double-click on an alert and the general view with all its information is displayed. This information contains the usual things, such as the date and time when the event occurred, a description of the problem, and so on. It also allows you to assign the error to a support person, and change the status from active to resolved and specify the urgency of the problem. But it the general view also has an Advanced tab where a lot more information regarding the error, such as what the actual error was, which server caused it, and so on, are shown. The following screenshot shows both tabs side-by-side:
[edit] Summary and Advanced Tabs
The Summary tab provides a full summary of your AD replications. It shows everything from every member, with the domain listed at the top. The domain view shows the number of files that are backlogged, the number of servers that have yellow connections, (that is, unhealthy ones) the servers that have a high connection count, and active notifications regarding the servers that are selected. All of these are illustrated in the following screenshot.
The Advanced tab extends the Summary tab, and all of the other ones. It allows you to query any information in the Ultrasound database. On the normal view, you can select pre-configured general view collections of your replica set, in the left hand pane. There are more views, such as Failed AD updates, than in any of the previous screens, although it is possible to easily create custom filters.
To create a custom filter for a view, which you can even configure to email you in case of a certain event happening, simply select the view and click on the Row Filter drop-down selection, and then click the "…" button, or leave it at NO FILTER and click the "..." button. You will be presented with a window that allows you to either change a filter by selecting it and then clicking on change, or to create a completely new one. In our case, we will edit the AD Collection error filter in the Failed AD updates view. Simply click on the second row with the Error 301 column, and click on Change.
We will change this filter to:
1. Notify us by email if a collection error occurs and
2. Set the health metrics for this filter to critical, as it then raises red flags immediately in the event of an AD collection failure.
This might seem a bit drastic as a collection failure can occur for a number of reasons, but unless these reasons occur a lot in your infrastructure, this should be a good way of identifying anomalies.
First, in the Change window, click on the Alert tab and select Enable notifications. Then, select the Custom notifications option. Finally, simply click on ADD on the right-hand side of the dialog box, and enter the email address to which you want the notification to be sent. You can only add one email address per notification, so, you have to add each email address separately. However,you can also log an event, even though you are receiving an email (as shown in the following screenshot).
To set the health metrics to critical, first click on the Health Metrics tab, and click on Enable health metrics. Then, simply click on ADD, leave Replica Set selected, and select Critical from the bottom drop-down menu (as shown in the following screenshot). Finally, simply click on OK and you will be returned to Ultrasound.
At this point, you could just minimize Ultrasound. The WMI collectors will continuously feed it data, and the AD collection alert will notify you for an AD replication collection failure. If you installed Ultrasound with a standard installation of SQL Server then you can close the program and the WMI collectors will continue to feed data straight into the database. If you have Ultrasound installed with a Desktop Engine, or SQL Server 2005 Express, you should have the application running continuously. You can, of course, configure many more notifications to make sure that you cover all your bases, and do not have to spend time continually watching Ultrasound.
Ultrasound is a utility that has a somewhat steep learning curve for a short time, but can help you keep a perfectly healthy, replicating AD, when deployed correctly and used well.
[edit] Summary
In this tutorial we discussed a few tools and utilities that will help you monitor and diagnose your AD. Although these might not be directly-related to disaster recovery, it is always good to have such important information at hand, as this can then allow you to find a problem before it becomes too widespread.
Also, small command line utilities such as DcDiag and NetDiag, together with the whole set of tools in the Resource Kit and the Support tools, are invaluable to have in the DCs, or at least on an administrative machine where they are available for use at any time. The output of these smaller utilities can be faster than sifting through event logs that also contain a lot of other things. Lastly, having tools such as Ultrasound deployed is useful. But if you have no processes defined for how and how often to monitor them, or the corrective course of action in to take case of a problem, its value decreases significantly.
[edit] Source
The source of this content is Chapter 10:Common Recovery Tools Explained of Active Directory Disaster Recovery by Florian Rommel Packt Publishing, 2008).

