Getting a CMD prompt as SYSTEM in Windows Vista and Windows Server 2008

October 22, 2008, 12:51 pm

≫ Next: Forced Demotion of a Windows Server 2008 Core Domain Controller

Ned here again. In the course of using Windows, it is occasionally useful to be someone besides… you. Maybe you need to be an Administrator temporarily in order to fix a problem. Or maybe you need to be a different user as only they seem to have...(read more)Image may be NSFW.
Clik here to view.

↧

Forced Demotion of a Windows Server 2008 Core Domain Controller

November 14, 2008, 7:14 am

≫ Next: Directory Services and more, from Madrid

≪ Previous: Getting a CMD prompt as SYSTEM in Windows Vista and Windows Server 2008

Ned here again. Today's post is short and sweet, but when you need this one you will need it fast and we don't have this publically documented anywhere on TechNet (yet). Since Windows 2000 SP4, it has been possible to forcibly demote Domain Controllers...(read more)Image may be NSFW.
Clik here to view.

↧

Directory Services and more, from Madrid

November 18, 2008, 5:34 pm

≫ Next: DFSR SYSVOL Migration FAQ: Useful trivia that may save your follicles

≪ Previous: Forced Demotion of a Windows Server 2008 Core Domain Controller

Ned here again. I recently spent a week with Microsoft Support Engineers from all over the world, and bumped into a colleague that works in MS Spain, out of Madrid. She mentioned that they had a Spanish-language blog focused on Directory Services, networking...(read more)Image may be NSFW.
Clik here to view.

↧

DFSR SYSVOL Migration FAQ: Useful trivia that may save your follicles

January 5, 2009, 4:23 pm

≫ Next: Addendum: Making the DelegConfig website work on IIS 7

≪ Previous: Directory Services and more, from Madrid

Hi, Ned here again. Today I'm going to go through some well-hidden information on DFSR SYSVOL migration; hopefully this sticks in your brain and someday allows you to enjoy your weekend rather than spending it fighting issues. As you already know,...(read more)Image may be NSFW.
Clik here to view.

↧

Addendum: Making the DelegConfig website work on IIS 7

January 26, 2009, 2:56 pm

≫ Next: Headache Prevention: Install Hotfix 953317 to Prevent DNS Records from Disappearing from Secondary DNS Zones on Windows Server 2008 SP1

≪ Previous: DFSR SYSVOL Migration FAQ: Useful trivia that may save your follicles

Hi All Rob here again. I thought I would take the time today and expand upon the Kerberos Delegation website blog to show how you can use the web site on IIS 7. Actually, Ned beat me up pretty badly for not showing how to set the site up on IIS 7 [ I...(read more)Image may be NSFW.
Clik here to view.

↧

Headache Prevention: Install Hotfix 953317 to Prevent DNS Records from Disappearing from Secondary DNS Zones on Windows Server 2008 SP1

February 12, 2009, 4:26 pm

≫ Next: Changes in Functionality from 2008 to 2008 R2 (mostly)

≪ Previous: Addendum: Making the DelegConfig website work on IIS 7

Craig here. We’ve had some nasty cases related to this bug, so it seemed prudent to do our best to increase the awareness of this issue. In a nutshell, the DNS Server service in Windows Server 2008 has a bug that can result in a large number of DNS records...(read more)Image may be NSFW.
Clik here to view.

↧

Changes in Functionality from 2008 to 2008 R2 (mostly)

March 2, 2009, 12:14 pm

≫ Next: DS Restore Mode Password Maintenance

≪ Previous: Headache Prevention: Install Hotfix 953317 to Prevent DNS Records from Disappearing from Secondary DNS Zones on Windows Server 2008 SP1

Ned here again. We're all snowed in down in Charlotte today, but that doesn't stop the blogging. We've published a new TechNet guide to many of the changes between Windows Server 2008 and Windows Server 2008 R2; it's definitely worth a look and has good...(read more)Image may be NSFW.
Clik here to view.

↧

DS Restore Mode Password Maintenance

March 11, 2009, 10:01 am

≫ Next: Netmon, MPS, RODC's, and that new OS you might have heard about

≪ Previous: Changes in Functionality from 2008 to 2008 R2 (mostly)

Ned here again. There comes a day in nearly every administrator’s life where they will need to boot a domain controller into DS Restore Mode. Whether it’s to perform an authoritative restore or fix database issues , you will need the local...(read more)Image may be NSFW.
Clik here to view.

↧

Netmon, MPS, RODC's, and that new OS you might have heard about

April 25, 2009, 12:43 am

≫ Next: ADMT 3.1 and Windows Server 2008 R2

≪ Previous: DS Restore Mode Password Maintenance

Ned here. A few big pieces of news, in case you've been having a busy week:

Netmon 3.3 has been released. You can download from here. Read more about the new features (such as autoscroll, frame commenting, experts, WWAN support, and more) right here.
MPS Reports. They're back. They work on Vista and 2008, as well as XP and 2003. You don't need a support case to use them. Grab here. Hallelujah.
RODC's in DMZ's. Whitepaper on deploying AD Read-Only Domain Controllers into perimeter networks. Jump to it.

And the mack daddy...

Windows 7 and Windows Server 2008 R2 Release Candidate. On schedule for an April 30th MSDN/TechNet release, and a May 5th public release. Wait, you don't believe me, your faithful Beta engineer? Pfffft, believe it.

- Ned 'Talking Head' Pyle

Image may be NSFW.
Clik here to view.

↧

ADMT 3.1 and Windows Server 2008 R2

May 22, 2009, 8:34 am

≫ Next: ADMT, RODC’s, and Error 800704f1

≪ Previous: Netmon, MPS, RODC's, and that new OS you might have heard about

Hello All,

UPDATE June 19 2010 - stop reading and go here:

http://blogs.technet.com/b/askds/archive/2010/06/19/admt-3-2-released.aspx

=====

There’s a known issue with installing Active Directory Migration Tool (ADMT) v3.1 onto a Windows Server 2008 R2 computers that I want to bring to everyone’s attention. At this time it has been acknowledged that version 3.1 (which does require Windows Server 2008) returns the following error when attempting to install it onto R2:

"ADMT must be installed on Windows Server 2008"

This issue also occurs with Windows 2008 machines that previously had ADMT installed, and then upgraded to Windows 2008 R2. ADMT will no longer function correctly and returns the same error as detailed above. Microsoft is aware of the issue and diligently working on a resolution. Please stay tuned for further details and updates.

I’d also like to take this opportunity to ask that you send me any future feature suggestions and requests for the tool, as I’ve been asked to present results of the “voice of the customer”. The ADMT development group would like to hear from our customers on how we could make the product better. Please feel free to post comments or email your recommendations and suggestions in what you’d like to see in a later release of ADMT.

Happy migrating!

-Jason Fournerat

Image may be NSFW.
Clik here to view.

↧

ADMT, RODC’s, and Error 800704f1

October 19, 2009, 9:11 am

≫ Next: Group Policy Slow Link Detection using Windows Vista and later

≪ Previous: ADMT 3.1 and Windows Server 2008 R2

Hello all, Jason here again. With this blog post, I just wanted to bring an ADMT issue to the masses’ attention, as I’ve experienced it multiple times within just the last couple of months.

There’s an issue when attempting to migrate computer account objects into a Windows 2008 domain that had been prepared for a Read-Only Domain Controller with the ‘ADPrep /RODCPrep’ command. To confirm if the command had been implemented, look for the following attribute within the ADSIEdit snap-in on the targeted 2008 domain:

CN=ActiveDirectoryRodcUpdate,CN=ForestUpdates,CN=Configuration,DC=<DomainName>,DC=com
Note: If ran, the value for the ‘Revision’ attribute will be set to ‘2’.

This is what is specifically witnessed within the ADMT log file:

ERR3:7075 Failed to change domain affiliation, hr=800704f1

The system detected a possible attempt to compromise security. Please ensure that you can contact the server that authenticated you.

When this error is generated, it is due to the following hotfix NOT being installed onto the client machine that you are migrating into the Windows 2008 domain:

944043 Description of the Windows Server 2008 read-only domain controller compatibility pack for Windows Server 2003 clients and for Windows XP clients and for Windows Vista
http://support.microsoft.com/default.aspx?scid=kb;EN-US;944043

Upon installing the hotfix and rebooting the client machine(s), re-running ADMT for the computer object migration will now succeed.

- Jason “J4” Fournerat

Image may be NSFW.
Clik here to view.

↧

Group Policy Slow Link Detection using Windows Vista and later

October 23, 2009, 8:13 am

≫ Next: Auditing Password and Account Lockout Policy on Windows Server 2008 and R2

≪ Previous: ADMT, RODC’s, and Error 800704f1

Mike here again. Many Group Policy features rely on a well connected network for their success. However, not every connection is perfect or ideal; some connections are slow. The Group Policy infrastructure has always provided functionality to detect slow links. However, the means by which Group Policy determines this are different between operating systems prior to Windows Server 2008 and Windows Vista.

Before Windows Server 2008 and Vista

Windows Server 2003, Windows XP, and Windows 2000 Group Policy uses the ICMP protocol to determine a slow link between the Group Policy client and the domain controller. This process is documented in Microsoft Knowledgebase article 227260: How a slow link is detected for processing user profiles and Group Policy (http://support.microsoft.com/default.aspx?scid=kb;EN-US;227260).

The Group Policy infrastructure performs a series of paired ICMP pings from the Group Policy client to the domain controller. The first ping contains a zero byte payload while the second ping contains a payload size of 2048 bytes. The results from both pings are computed and voila, we have the bandwidth estimation. However, using ICMP has some limitations.

Many "not-so-nice" applications use ICMP maliciously. This new found use increased ICMP’s popularity forced IT professional to take precautions. These precautions included blocking ICMP. The solution to block ICMP provided relief from the susceptibility of malicious ICMP packets, but broke Group Policy. Workarounds were created (Microsoft Knowledgebase article 816045 Group Policies may not apply because of network ICMP policies); But the update did not remove the ICMP dependency.

The Windows Server 2008 and Vista era

Windows 7 and Windows Vista to the rescue! These new operating systems implement a new slow link detection mechanism that DOES NOT use ICMP-- but we already knew this. The question we will answer is how does the new Group Policy slow link detection work?

The easy answer to how the new slow link detection works is Network Location Awareness (NLA). This networking layer service and programming interface allows applications, like Group Policy, to solicit networking information from the network adapters in a computer, rather than implementing their own methods and algorithms. NLA accomplishes this by monitoring the existing traffic of a specific network interface. This provided two important benefits: 1) it does not require any additional network traffic to accomplish its bandwidth estimate-- no network overhead, and 2) it does not use ICMP.

Group Policy using NLA

The question commonly asked is how does Group Policy slow link detection implement NLA. The actual algorithms used by NLA are not as important as what Group Policy does during its request to NLA for bandwidth estimation.

Locate a domain controller

A Group Policy client requires communication with a domain controller to successfully apply Group Policy. The Group Policy service must discover a domain controller. The service accomplishes this by using the DCLocator service. Windows clients typically have already discovered a domain controller prior to Group Policy application. DCLocator caches this information makes it available to other applications and services. The Group Policy service makes three attempts to contact a domain controller, with the first attempt using the domain controller information stored in the cache. The latter two attempts force DCLocator to rediscover domain controller information. Retrieving cached domain controller information does not traverse the network, but forceful rediscovery does. Domain controller information includes the IP address of the domain controller. The Group Policy service uses the IP address of the domain controller (received from DCLocator) to begin bandwidth estimation.

During bandwidth estimation

The Group Policy service begins bandwidth estimation after it successfully locates a domain controller. Domain controller location includes the IP address of the domain controller. The Group Policy service performs the following actions during bandwidth estimation.

NOTE: All actions listed in this section generate network traffic from the client to the domain controller unless otherwise noted. I've included a few actions that do not generate network traffic because their results could be accomplished using methods that generate network traffic. These actions are added for clarity.

Authentication

The first action performed during bandwidth estimation is an authenticated LDAP connect and bind to the domain controller returned during the DCLocator process. This connection to the domain controller is done under the user's security context and uses Kerberos for authentication. This connection does not support using NTLM. Therefore, this authentication sequence must succeed using Kerberos for Group Policy to continue to process. Once successful, the Group Policy service closes the LDAP connection.

NOTE: The user's security context is relative to the type of Group Policy processing. The security context for computer Group Policy processing is the computer. The security context for the user is the current user for the current session.

The Group Policy service makes an authenticated LDAP connection as the computer when user policy processing is configured in loopback-replace mode.

Determine network name

The Group Policy services then determines the network name. The service accomplishes this by using IPHelper APIs to determine the best network interface in which to communicate with the IP address of the domain controller. The action also uses Winsock APIs; however, this action does not create any network traffic. Additionally, the domain controller and network name are saved in the client computer's registry for future use.

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Group Policy\History is where the service stores these values. The value names are DCName and NetworkName.

NOTE: The NetworkName registry value is used by the Windows firewall to determine if it should load the domain firewall profile.

Site query

Group Policy processing must know the site to which the computer belongs. To accomplish this, the Group Policy service uses the Netlogon service. Client site discovery is an RPC call from the client computer to a domain controller. The client netlogon service internally caches the computer's site name. The time-to-live (TTL) for the site name cache is five minutes. However, TTL expiry is on demand. This means the client only checks the TTL during client discovery. This check is implemented by Netlogon (not the Group Policy service). If the cached name is older than five minutes from when the name was last retrieved from the domain controller, then the Netlogon service makes an RPC call to the domain controller to discover the computer site. This explains why you may not see the RPC call during Group Policy processing. However, the opportunity for network traffic exists.

Determine scope of management

The following Group Policy actions vary based on Group Policy processing mode. Computer Group Policy processing only uses normal Group Policy processing. However, user Group Policy processing can use normal, loopback-merge, and loopback-replace modes.

Normal mode

Normal Group Policy processing is the most common Group Policy processing actions. Conceptually these work the same regardless of user or computer. The most significant difference is the distinguished name used by the Group Policy service.

Building the OU and domain list

The Group Policy service uses the distinguished name of the computer or user to determine the list of OUs and the domain it must search for group policy objects. The Group Policy service builds this list by analyzing the distinguished name from left to right. The service scans the name looking for each instance of OU= in the name. The service then copies the distinguished name to a list, which it uses later. The Group Policy service continues to scan the distinguished name until for OUs until it encounters the first instance of DC=. At this point, the Group Policy service has found the domain name, which completes the list. This action does not generate any network traffic.

Example: Here is the list from a given distinguished name

Distinguished Name:
            cn=user,OU=marketing,OU=hq,DC=na,DC=contoso,DC=com
List:
            OU=marketing,OU=hq,DC=na,DC=contoso,DC=com
            OU=hq,DC=na,DC=contoso,DC=com
            DC=na,DC=contoso,DC=com

Evaluate scope of management

The Group Policy service uses the list OUs to determine the Group Policy objects linked to each scope of management and the options associated with each link. The service determines linked Group Policy objects by using a single LDAP query to the domain controller discovered earlier.

LDAP requests have four main components: base, scope, filter, and attributes. The base is used to specify the location within the directory the search should begin, which is usually represented as a distinguished name. The scope determines how far the search should traverse into the directory; starting from the base. The options include base, one-level, and subtree. The base scope option limits the search to only return objects matching the filter that matches the base. The onelevel option return objects from one level below the base, but not including the base. The subtree option returns objects from the base and all levels below the base. The filter provides a way to control what objects the search should return (see MSDN for more information on LDAP search filter syntax). The attribute setting is a list of attributes the search should return for the objects discovered that match the filter.

The service builds the LDAP request with the following arguments:

BaseDN: domain
Scope: Sub Tree
Filter: (|(distinguishedname=OU=xxx)( more OUs)(ends domainNC DC=))
Attributes: gpLink, gpOptions, ntSecurityDescriptor
Example: Scope of management LDAP search
       BaseDN: DC=na,DC=contoso,DC=com
       Scope: SubTree
       Filter: (|(distinguishedname= OU=marketing,OU=hq,DC=na,DC=contoso,DC=com)
               (distinguishedname =OU=hq,DC=na,DC=contoso,DC=com)
               (distinguishedname =DC=na,DC=contoso,DC=com))
    Attributes:gPlink,gPoptions,nTSecurityDescriptor

Determining the scope of normal Group Policy processing mode occurs in the security context of the applying security principal. The computer performs the LDAP query computer processing and the user performs the LDAP query for user processing. Merge and Replace are user-only processing modes, which occur under the security context of the user.

Replace user-processing performs an LDAP query using the computer’s distinguished name. Each component of the distinguished name is inserted into the filter portion of the LDAP query. The LDAP query filter parameter ends with the distinguished name of the domain (which is assembled using the parts of the computer’s distinguished name.

Merge user-processing performs two LDAP queries. The first LDAP query uses the distinguished name of the user object. The second query uses the distinguished name of the computer object. The Group Policy links returned from both queries are merged into one list. The Group Policy service merges these lists together by adding the Group Policy links returned from the computer query to the end of the list of Group Policy links returned from the user query. Concatenating the computer list to the end of the user list results with the Group Policy links listed in the order they apply.

Determine the Link Status:

The Group Policy service is ready to determine the status of the link between the client computer and the domain controller. The service asks NLA to report the estimated bandwidth it measured while earlier Group Policy actions occurred. The Group Policy service compares the value returned by NLA to the GroupPolicyMinTransferRate named value stored in HKEY_LOCAL_MACHINE\Software\Microsoft\WindowsNT\CurrentVersion\Winlogon, which is the preference key or, HKEY_LOCAL_MACHINE\Software\Policies\Microsoft\Windows\System, which is the policy key. The default minimum transfer rate to measure Group Policy slow link is 500 (Kbps). The link between the domain controller and the client is slow if the estimated bandwidth returned by NLA is lower than the value stored in the registry. The policy value has precedence over the preference value if both values appear in the registry. After successfully determining the link state (fast or slow—no errors), then the Group Policy service writes the slow link status into the Group Policy history, which is stored in the registry. The named value is IsSlowLink and is located at HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Group Policy\History. This value is an REG_DWORD value that is interpreted as a Boolean value; with a non-zero value equaling false and a zero value equaling true. If the Group Policy service encounters an error, it read the last recorded value from the history key and uses that true or false value for the slow link status.

Conclusion

Group Policy slow link detection has matured since the days of using ICMP for slow link detection. Today, Windows 7 and Windows Vista’s Group Policy services use NLA to sample TCP communication between the client and the domain controller, without sending additional network traffic.

- Mike “Huuuh, whaaaa?” Stephens

Image may be NSFW.
Clik here to view.

↧

Auditing Password and Account Lockout Policy on Windows Server 2008 and R2

November 2, 2009, 10:13 am

≫ Next: Clustered Certification Authority maintenance tasks

≪ Previous: Group Policy Slow Link Detection using Windows Vista and later

Ned here again. Let’s talk about auditing your domain for changes made to Password and Account Lockout policies. Frankly, it’s a real pain in the neck to figure out Password and Account Lockout auditing and there are legacy architectural decisions behind how this all works, so I’ll make sure to cover all the bases. This also includes auditing your Fine Grain Password policies (FGPP), for you bleeding-edge types.

Understanding how these policies work

We use Password and Account Lockout policies to control domain authentication. Password policies set requirements for things like password length, complexity, and maximum age. Account Lockout policies control lockout threshold and duration, and are very popular with The Devil.

There are two types of Password and Account Lockout policies in a domain:

Domain-wide– Introduced in Windows NT and set in Active Directory through domain security policy.
Fine Grained– Introduced in Windows Server 2008 and set in AD through manual means like ADSIEDIT or AD PowerShell. It configures settings on a user or group-membership basis, and there can be as many as you like.

Domain-based policy, while being set through security policy, is actually written to attributes on the root of the domain. ADSIEdit shows this object using the distinguished name of the domain name. This odd location results from providing NT 4.0 compatibility. Since NT 4.0 could not apply group policy, we had to store these values somewhere and answer requests about the settings in an NT fashion.

Image may be NSFW.
Clik here to view.

On the other hand, Fine Grained policies write to their own location. Windows stores each policy as a leaf object.

Image may be NSFW.
Clik here to view.

When you edit your built-in Default Domain password policy, you are actually editing:

\\contoso.com\sysvol\contoso.com\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}\MACHINE\Microsoft\Windows NT\SecEdit\GptTmpl.inf

All your settings are in this format:

[System Access]
MinimumPasswordAge = 0
MaximumPasswordAge = 60
MinimumPasswordLength = 8
PasswordComplexity = 1
PasswordHistorySize = 4
LockoutBadCount = 50
ResetLockoutCount = 30
LockoutDuration = 30
RequireLogonToChangePassword = 0
ForceLogoffWhenHourExpire = 0
ClearTextPassword = 0
LSAAnonymousNameLookup = 0

When DC applies this security policy during the five minute group policy refresh, the DC stamps these settings on the domainDNS object. And voila, you have your policies in place. But think about that – the DC stamps these settings in place when applying computer policy. Who do you think will be listed as the user in your audit event logs? That’s right – the DC itself. And that’s where this blog post comes in. :-)

Auditing Domain-Wide Policy

There are three main things you need to do to see domain-wide password and account lockout setting changes, but they differ slightly by OS:

1. Put an auditing entry on the “Policies” container. Enabling auditing for EVERYONE on the “CN=Policies,CN=System,DC=<your domain>” container causes auditing to track all writes, deletes, and permission modifications. The audit event shows the user modifying group policy in general. Obviously, this is useful for more than just password policy changes – “Hey, who set this policy to push a Domo-Kun wallpaper out to all the computers?”
Image may be NSFW.
Clik here to view.
2. Enable subcategory auditing for:
    a. “Authentication Policy Change” (if using Windows Server 2008 R2 DC’s).
    b. “Other Account Management Events” (if using Windows Server 2008 DC’s).
3. Enable subcategory auditing for “Directory Service Changes”.
    Note: In Windows Server 2008 R2, granular subcategory auditing is available through GPMC.
Image may be NSFW.
Clik here to view.
In Windows Server 2008, you need to use the script provided in KB921469.

After enabling auditing, Windows then generates security audit events for anyone editing domain-wide security policy for passwords and account lockouts:

1.    An event 5136 will be written that shows the versionNumber attribute of the policy being raised:
Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 3:04:17 PM
Event ID:      5136
Task Category: Directory Service Changes
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
A directory service object was modified.
Subject:
    Security ID:        CONTOSO\Administrator
    Account Name:        Administrator
    Account Domain:        CONTOSO
    Logon ID:        0x1e936
Directory Service:
    Name:    contoso.com
    Type:    Active Directory Domain Services
Object:
    DN:    CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=POLICIES,CN=SYSTEM,DC=CONTOSO,DC=COM
    GUID:    CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=Policies,CN=System,DC=contoso,DC=com
    Class:    groupPolicyContainer
Attribute:
    LDAP Display Name:    versionNumber
    Syntax (OID):    2.5.5.9
    Value:    121

Note: The event ID shows the name of the user that modified the policy – every policy edit raises the version number. Now we know to go look at the policy and that someone changed it.
2. Windows writes a follow-up event (event id 4739) for each type of change – lockout policy or password policy. For example:
Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 3:01:28 PM
Event ID:      4739
Task Category: Authentication Policy Change
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
Domain Policy was changed.
Change Type:        Lockout Policy modified
Subject:
    Security ID:        SYSTEM
    Account Name:        2008R2-F-01$
    Account Domain:        CONTOSO
    Logon ID:        0x3e7
Domain:
    Domain Name:        CONTOSO
    Domain ID:        CONTOSO\
Changed Attributes:
    Min. Password Age:    -
    Max. Password Age:    -
    Force Logoff:        -
    Lockout Threshold:    500
    Lockout Observation Window:
    Lockout Duration:
    Password Properties:
    Min. Password Length:
    Password History Length:
    Machine Account Quota:
    Mixed Domain Mode:
    Domain Behavior Version:
    OEM Information:    -
====
Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 3:04:23 PM
Event ID:      4739
Task Category: Authentication Policy Change
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
Domain Policy was changed.
Change Type:        Password Policy modified
Subject:
    Security ID:        SYSTEM
    Account Name:        2008R2-F-01$
    Account Domain:        CONTOSO
    Logon ID:        0x3e7
Domain:
    Domain Name:        CONTOSO
    Domain ID:        CONTOSO\
Changed Attributes:
    Min. Password Age:    -
    Max. Password Age:    -
    Force Logoff:        -
    Lockout Threshold:    -
    Lockout Observation Window:    -
    Lockout Duration:    -
    Password Properties:    -
    Min. Password Length:    5
    Password History Length:    -
    Machine Account Quota:    -
    Mixed Domain Mode:    -
    Domain Behavior Version:    -
    OEM Information:    -

Notice the account name is the DC itself. This event, while useful, needs to be correlated with the 5136 event to see what changed. And even then, these events can sometimes be difficult to understand – what is a “password property” after all? (it’s for complexity being turned on or off). You should probably use these events as a notification to go examine the actual policies in GPMC.

You’re probably asking yourself why I didn’t just audit the actual domain root object and skip using the “Authentication Policy Change” and “Other Account Management Events”. This is another of the vagaries of security policy auditing – it doesn’t work. Simply auditing the “DC=domain,DC=com” object does not return any information about password or lockout changes. Go figure.

Auditing Fine-Grained Policy

Auditing FGPP is simpler and the data is easier to read. FGPP does not contain intermediate security policy settings. Creating and modifying these policies directly edits the objects in Active Directory. You can create or modify FGPP using PowerShell, LDP, LDIFDE, or ADSIEDIT. This means there’s no layer between doing work on your behalf. Also, your audit events are clean and self-evident.

1. Put an auditing entry on the “Password Settings Container” container. Enabling auditing for EVERYONE on the “CN=Password Settings Container,CN=System,DC=<your domain>” object causes Windows to track all users who write, delete, and modify permissions on any FGPPs.
Image may be NSFW.
Clik here to view.
2. Enable subcategory auditing for “Directory Service Changes” (see previous section for steps).

After enabling auditing, Windows generates a security audit event for anyone editing FGPPs for each change made. Also, the audit event includes the new value and the value prior to the change:

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 4:20:54 PM
Event ID:      5136
Task Category: Directory Service Changes
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
A directory service object was modified.
Subject:
    Security ID:        CONTOSO\RobGreene
    Account Name:        RobGreene
    Account Domain:        CONTOSO
    Logon ID:        0x1e936
Directory Service:
    Name:    contoso.com
    Type:    Active Directory Domain Services
Object:
    DN:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    GUID:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    Class:    msDS-PasswordSettings
Attribute:
    LDAP Display Name:    msDS-PasswordComplexityEnabled
    Syntax (OID):    2.5.5.8
    Value:    TRUE
Operation:
    Type:    Value Deleted
    Correlation ID:    {6afa8930-85cd-44d9-828b-9cc3c1b5a8b9}
    Application Correlation ID:    -
===
Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 4:20:54 PM
Event ID:      5136
Task Category: Directory Service Changes
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
A directory service object was modified.
Subject:
    Security ID:        CONTOSO\RobGreene
    Account Name:        RobGreene
    Account Domain:        CONTOSO
    Logon ID:        0x1e936
Directory Service:
    Name:    contoso.com
    Type:    Active Directory Domain Services
Object:
    DN:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    GUID:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    Class:    msDS-PasswordSettings
Attribute:
    LDAP Display Name:    msDS-PasswordComplexityEnabled
    Syntax (OID):    2.5.5.8
    Value:    FALSE
Operation:
    Type:    Value Added
    Correlation ID:    {6afa8930-85cd-44d9-828b-9cc3c1b5a8b9}
    Application Correlation ID:    -

Here I can see the user RobGreene logged on and changed the password complexity requirements from TRUE to FALSE. I knew it! Rob Greene, always breaking my stuff…

See Edie, I told you I’d write a blog post on this. :-)

- Ned “the chiropractor” Pyle

Image may be NSFW.
Clik here to view.

↧

Clustered Certification Authority maintenance tasks

January 7, 2010, 8:50 am

≫ Next: Tuning replication performance in DFSR (especially on Win2008 R2)

≪ Previous: Auditing Password and Account Lockout Policy on Windows Server 2008 and R2

Hi all Rob Greene here again. I thought I would share with you how to do some common tasks with a Windows Server 2008 clustered Certification Authority (CA). When the CA is clustered there are definitely different steps that need to be taken when you:

Make a change to the behavior of the CA by using certutil.exe with –setreg or –delreg switches.
Modify the registry values in the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\CertSvc hive.
Renew the CA’s certificate.

In the past before the Certification Authority service (CertSvc) was supported in a cluster, you could make these changes and then stop and start the CertSvc service without a problem. This is still the case when the Certification Authority has not been clustered.

However, when you have the Certification Authority configured as a cluster you must avoid starting and stopping the service outside of the Cluster Administrator snap-in (Cluadmin.msc). The reason is that the Cluster Service not only keeps track of the service state for CertSvc, it is also responsible for making sure that the registry key location HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\CertSvc is saved to the quorum disk when it notices a change to the registry location. This is noted in the CA Cluster whitepaper also, which is required reading for anyone clustering CA’s.

Changing the behavior of the Certification Authority

If you need to make a change to the behavior of the Certification Authority with CertUtil.exe or direct registry modification, you must always follow the steps below:

1. Logon to the active clustered Certification Authority node. If you are unsure which node currently owns the resource do the following:
a. Launch the failover Cluster Management MMC.
b. Select the Certification Authority resource, and in the right hand pane you will see the “Current Owner” (See figure 1).

Image may be NSFW.
Clik here to view.
Figure 1 - Current Owner of the Certification Authority resource

2. Use certutil.exe –setreg (recommended) command, or modify the registry directly.

3. Launch the Failover Cluster Management snap-in.
4. Take the Certification Authority resource (Service) offline and then bring it back online. We have to take the resource offline and back online since the CertSvc service will not read any registry key changes without being restarted, and as I stated above when the CA is clustered you should refrain from stopping and starting the CertSvc service directly.
a. Right click on the Certification Authority resource in the tree view pane.
b. Select either “Bring this service or application online” or “Take this service or application offline” (See figure 2).

Image may be NSFW.
Clik here to view.
Figure 2 - Taking the resource offline / online

Renewing the subordinate Certification Authority certificate

This section discusses the steps that need to be done when renewing a subordinate CA certificate. A Root certification authority shouldn’t be clustered and instead should be configured as an offline root.

Verify the request file location

When the CA certificate is renewed it will stop the CertSvc service to generate the certificate request file. The request file location and name is dictated by the following registry key:

HKEY_Local_Machine\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>\RequestFileName

Before renewing the CA you will want to make sure that the registry key points to a valid file system path. If it is not, either the renewal will fail silently, or you might get the error “The system cannot find the file specified” when you attempt to renew the CA. If you have to change this value do the following on the active CA node:

1. certutil –setreg CA\RequestFileName “<File path and name>”. For example:

certutil –setreg CA\RequestFileName "c:\contoso_subCA1.req”
2. Take the resource offline and back online (See Figure 2 above).

Renewing the Certification Authority certificate

As noted earlier, if the CertSvc service is stopped or started outside of the Failover Cluster Management snap-in the cluster system is not aware of any changes that are done to the registry. Here is a high level process of what happens when a CA is renewed so that you can understand why the below steps are necessary on a clustered CA:

1. CertSvc service is stopped to generate the certificate request file. It reads the RequestFileName registry value to determine where and what the file name should be for the request file.
2. CertSvc service is started once the request file has been generated.
3. CertSvc service is stopped again to install the issued certificate from the CA.
4. The CACertHash registry value is updated to include the new CA certificate hash.

NOTE: NEVER DELETE OR MODIFY this registry value unless directed by Microsoft support. Modifying this registry key can cause the CA not to function properly or in some cases to not even start!

Here are the actual steps to renew the CA on a cluster.

1. Open the Failover Cluster Management snap-in.
2. “Pause” the inactive Certification Authority node. If you are unsure about which server is the active node see Figure 1.
a. Select the computer node in the Failover Cluster Management snap-in.
b. Right click on it select “Pause”.

Image may be NSFW.
Clik here to view.
Figure 3 - Pausing the inactive node

3. Once the inactive node is paused you can renew the CA’s certificate. Please review the following TechNet article to help with the process of actually getting the subordinate CA certificate renewed.
4. Once you have gotten the CA’s certificate renewed by the root CA, and installed the new certificate to the subordinate CA you will need to take the Certification Authority resource offline and then back online within the Failover Cluster Management snapin.
a. Right click on the Certification Authority resource in the tree view pane.
b. Select either “Bring this service or application online” or “Take this service or application offline” (See figure 2 above).
5. Open the Certification Authority snapin, and target the Clustered Network resource name.
6. Right click on the Certification Authority name and select properties.
7. If you renewed with a new key pair you should see several certificates listed as show in figure 4.

Image may be NSFW.
Clik here to view.
Figure 4 - Certification Authority properties.

8. Once you have verified that the Certification Authority is using the renewed CA certificate you can “Resume” the node that was paused in step 2.

Since the Certification Authority service is configured as a generic service the above processes must be adhered to when managing a clustered CA. If changes are made outside of the Cluster service’s knowledge then the nodes will never be in sync and clustering will fail

- Rob “Raaaahhb” Greene

Image may be NSFW.
Clik here to view.

↧

Tuning replication performance in DFSR (especially on Win2008 R2)

March 31, 2010, 10:35 am

≫ Next: Son of SPA: AD Data Collector Sets in Win2008 and beyond

≪ Previous: Clustered Certification Authority maintenance tasks

Hi all, Ned here again. There are a number of ways that DFSR can be tuned for better performance. This article will go through these configurations and explain the caveats. Even if you cannot deploy Windows Server 2008 R2 - for the absolute best performance - you can at least remove common bottlenecks from your older environments. If you are really serious about performance in higher node count DFSR environments though, Win2008 R2’s 3^rd generation DFSR is the answer.

If you’ve been following DFSR for the past few years, you already know about some improvements that were made to performance and scalability starting in Windows Server 2008:

Windows Server 2003 R2	Windows Server 2008
Multiple RPC calls	RPC Async Pipes (when replicating with other servers running Windows Server 2008)
Synchronous inputs/outputs (I/Os)	Asynchronous I/Os
Buffered I/Os	Unbuffered I/Os
Normal Priority I/Os	Low Priority I/Os (this reduces the load on the system as a result of replication)
4 concurrent file downloads	16 concurrent file downloads

But there’s more you can do, especially in 2008 R2.

Registry tuning

All registry values are REG_DWORD (and in the explanations below, are always in decimal). All registry tuning for DFSR in Win2008 and Win2008 R2 is made here:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DFSR\Parameters\Settings

A restart of the DFSR service is required for the settings to take effect, but a reboot is not required. The list below is not complete, but instead covers the important values for performance. Do not assume that setting a value to the max will make it faster; some settings have a practical limitation before other bottlenecks make higher values irrelevant.

Important Note: None of these registry settings apply to Windows Server 2003 R2.

AsyncIoMaxBufferSizeBytes
Default value: 2097152
Possible values: 1048576, 2097152, 4194304, 8388608
Tested high performance value: 8388608
Set on: All DFSR nodes
RpcFileBufferSize
Default value: 262144
Possible values: 262144, 524288
Tested high performance value: 524288
Set on: All DFSR nodes
StagingThreadCount
Default value: 6
(Win2008 R2 only; cannot be changed on Win2008)
Possible values: 4-16
Tested high performance value: 8
Set on: All DFSR nodes. Setting to 16 may generate too much disk IO to be useful.
TotalCreditsMaxCount
Default value: 1024
Possible values: 256-4096
Tested high performance value: 4096
Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow)
UpdateWorkerThreadCount
Default value: 16
Possible values (Win2008): 4-32
Possible values (Win2008 R2): 4-63*
Tested high performance value: 32
Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow. The number being raised here is only valuable when replicating in from more servers than the value. I.e. if replicating in 32 servers, set to 32. If replicating in 45 servers set to 45.
*Important note: The actual top limit is 64. We have found that under certain circumstances though, setting to 64 can cause a deadlock that prevents DFSR replication altogether. If you exceed the maximum tested value of 32, set to 63 or lower. Do not set to 64 ever. The 32 max limit is recommended because we tested it carefully, and higher values were not rigorously tested. If you set this value to 64, periodically replication will stop working, the dfsrdiag replstate command hangs and does not return results, and the dfsrdiag backlog command hangs and does not return results.

When using all the above registry tuning on Windows Server 2008 R2, testing revealed that initial sync replication time was sometimes twice as fast compared to no registry settings in place. This was using 32 servers replicating a "data collection" topology to a single hub over thirty-two non-LAN networks with 32 RG's containing unique branch office data. The slower the network, the better the relative performance averaged:

Test	Spokes	Hubs	Topology	GB/node	Unique	RG	Tuned	Network	Time to sync
C1	32	1	Collect	1	Yes	32	N	1Gbps	0:57:27
C2	32	1	Collect	1	Yes	32	Y	1Gbps	0:53:09
C3	32	1	Collect	1	Yes	32	N	1.5Mbps	3:31:36
C4	32	1	Collect	1	Yes	32	Y	1.5Mbps	2:24:09
C5	32	1	Collect	1	Yes	32	N	512Kbps	10:56:42
C6	32	1	Collect	1	Yes	32	Y	512Kbps	5:57:09
C7	32	1	Collect	1	Yes	32	N	256Kbps	21:43:02
C8	32	1	Collect	1	Yes	32	Y	256Kbps	10:46:46

On Windows Server 2008 the same registry values showed considerably less performance improvement; this is partly due to additional service improvements made to DFSR in Win2008 R2, especially around the Credit Manager. Just like your phone, “3G” DFSR is going to work better than older models…

Note: do not use this table to predict replication times. It is designed to show behavior trends only!

Topology tuning

Even if you are not using Windows Server 2008 R2 there are plenty of other factors to fast replication. Some of these I’ve talked about before, some are new. All are important:

Minimize mixing of Win2003 and Win2008/Win2008 R2 - Windows Server 2008 introduced significant DFSR changes for RPC, inbound and outbound threading, and other aspects. However, if a Win2008 server is partnered with a Win2003 server for DFSR, most of those improvements are disabled for backwards compatibility. An ideal environment is 100% Windows Server 2008 R2, but a Win2008-only is still a huge improvement. Windows Server 2003 should be phased out of use as quickly as possible as it has numerous "1G" design issues that were improved on with experience in later OS's. Windows Server 2008 R2 credit manager and update worker improvements are most efficient when all operating systems are homogenous. If you are replacing Win2003 servers with newer OS, do the hub servers first as the increased number of files will provide some benefits even when talking to 2003 spokes.
Consider multiple hubs - If using a large number of branch servers in a hub-and-spoke topology, adding “subsidiary hub” servers will help reduce load on the main hubs.

So for example, this configuration would cause more bottlenecking:

Image may be NSFW.
Clik here to view.

And this configuration would cause less bottlenecking:

Image may be NSFW.
Clik here to view.

Increase staging quota - The larger the replicated folder staging quotas are for each server, the less often files must be restaged when replicating inbound changes. In a perfect world, staging quota would be configured to match the size of the data being replicated. Since this is typically impossible, it should be made as large as reasonably possible. It must always be configured to be at least as large as the combined size of the count of the files controlled by UpdateWorkerThreadCount+16 on Win2008 and Win2008 R2. Why 16? Because that is the number of outbound files that could be replicated simultaneously.

This means that by default on Win2008/Win2008 R2, quota must be as large as the 32 largest files. If UpdateWorkerThreadCount is increased to 32, it must be as large as the 48 largest files (32+16). If any smaller then staging can become blocked when all 32 files are being replicated inbound and 16 outbound, preventing further replication until that queue is cleared. Frequent 4202 and 4204 staging events are indications of an inappropriately configured staging quota, especially if no longer in the initial sync phase of setting up DFSR for the first time.
Source : DFSR
Catagory : None
Event ID : 4202
Type : Warning
Description :
The DFS Replication service has detected that the staging space in use for the replicated folder at local path c:\foo is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected.
Source : DFSR
Catagory : None
Event ID : 4204
Type : Information
Description :
The DFS Replication service has successfully deleted old staging files for the replicated folder at local path c:\foo. The staging space is now below the high watermark.

If you get 4206 staging events you have really not correctly sized your staging, as you are now blocking replication behind large files.
Event Type: Warning
Event Source: DFSR
Event Category: None
Event ID: 4206
Date: 4/4/2009
Time: 3:57:21 PM
User: N/A
Computer: SRV
Description:
The DFS Replication service failed to clean up old staging files for the replicated folder at local path c:\foo. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in 1 minutes. The service may start cleanup earlier if it detects some staging files have been unlocked.
If still using Win2003 R2, staging quota would need to be as large as the 9 largest files. And if using read-only replication on Windows Server 2008 R2, at least 16 or the size specified in UpdateWorkerThreadCount – after all, a read-only replicated folder has no outbound replication.
So to recap the staging quota minimum recommendations:
- Windows Server 2003 R2: 9 largest files
- Windows Server 2008: 32 largest files (default registry)
- Windows Server 2008 R2: 32 largest files (default registry)
- Windows Server 2008 R2 Read-Only: 16 largest files
If you want to find the 32 largest files in a replicated folder, here’s a sample PowerShell command:
Get-ChildItem <replicatedfolderpath> -recurse | Sort-Object length -descending | select-object -first 32 | ft name,length -wrap –auto

Consider read-only - Deploy Windows Server 2008 R2 read-only replication when possible. If users are not supposed to change data, mark those replicated folders as read-only. A read-only server cannot originate data and will prevent unwanted replication or change orders from occurring outbound to other servers. Unwanted changes generate load and lead to data overwrites – which to fix you will need to replicate back out from backups, consuming time and replication resources.

Latest QFE and SP - Always run the latest service pack for that OS, and the latest DFSR.EXE/DFSRS.EXE for that OS. There are also updates for NTFS and other components that DFSR relies on. Hotfixes have been released that remove performance bugs or make DFSR more reliable; a more reliable DFSR is naturally faster too. These are documented in KB968429 and KB958802 but the articles aren’t always perfectly up to date, so here’s a trick: If you want to find the latest DFSR service updates, use these three searches and look for the highest KB number in the results:

Win2008 R2: http://www.bing.com/search?q=%22windows+server+2008+r2%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&go=&form=QBRE
Win2008: http://www.bing.com/search?q=%22windows+server+2008%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n
Win2003 R2: http://www.bing.com/search?q=%22windows+server+2003+r2%22+%22dfsr.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n
Remember, Win2003 mainstream support ends July 13, 2010. That’s the end of non-security updates for that OS.
People ask me all the time why I take such a hard line on DFSR hotfixes. I ask in return “Why don’t you take such a hard line?” These fixes cost us a fortune, we’re not writing them for our health. And that goes for all other components too, not just DFSR. It’s an issue intrinsic to all software. DFSR is not less reliable than many other Windows components – after all, NTFS is considered an extremely reliable file system but that hasn’t stopped it from having 168 hotfixes in its lifetime; DFSR just has a passionate group of Support Engineers and developers here at MS that want you to have the best experience.

Turn off RDC on fast connections with mostly smaller files - later testing (not addressed in the chart below) showed 3-4 times faster replication when using LAN-speed networks (i.e. 1GBb or faster) on Win2008 R2. This is because it was faster to send files in their totality than send deltas, when the files were smaller and more dynamic and the network was incredibly fast. The performance improvements were roughly twice as fast on Win2008 non-R2. This should absolutely not be done on WAN networks under 100 Mbit though as it will likely have a very negative affect.

Consider and test anti-virus exclusions – Most anti-virus software has no concept of the data types that make up DFSR’s working files and database. Additionally, those file types are not executables and are therefore very unlikely to contain a useful malicious payload. If you are seeing slow performance within DFSR, test the following anti-virus file exclusions; if DFSR performs considerably better, contact your AV vendor for an updated version of their software and an explanation around the performance gap.

<drive>:\system volume information\DFSR\
   $db_normal$
   FileIDTable_*
   SimilarityTable_*
<drive>:\system volume information\DFSR\database_<guid>\
   $db_dirty$
   Dfsr.db
   Fsr.chk
   *.log
   Fsr*.jrs
   Tmp.edb
<drive>:\system volume information\DFSR\config\
   *.xml
<drive>:\<replicated folder>\dfsrprivate\staging\*
   *.frx

This should be validated carefully; many anti-virus products allow exclusions to be set but then do not actually abide by them. For maximum performance, you would exclude scanning any replicated files at all, but this is obviously unfeasible for most customers.

Pre-seed the data when setting up a new replicated folder- Pre-seeding - often referred to as "pre-staging" - data on servers can lead to huge performance gains during initial sync. This is especially useful when creating new branch office servers; if they are being built in the home office, they can be quickly pre-seeded with data then sent out to the field for replication of the change delta. See the following article for pre-seeding recommendations.

Going back to those same tests I showed earlier with 32 spokes replicating back to a single hub, note the average performance behavior when the data was perfectly pre-seeded:

Test
Spokes
Hubs
Topology
GB/node
Unique
RG
Tuned
Staging
Net
Time to sync
C9
32
1
Collect
1
Yes
32
Y
4GB
1Gbps
0:49:21
C11
32
1
Collect
1
Yes
32
Y
4GB
512Kbps
0:46:34
C12
32
1
Collect
1
Yes
32
Y
4GB
256Kbps
0:46:08
C13
32
1
Collect
1
Yes
32
Y
4GB
64Kbps
0:48:29

Even the 64Kbps frame relay connection was nearly as fast as the LAN! This is because no files had to be sent, only file hashes.
Note: do not use this table to predict replication times. It is designed to show behavior trends only.

Go native Windows Server 2008 R2 – Not to beat a dead horse but the highest performance gains - including registry tuning and the greatly improved Credit Manager code - will be realized by using Windows Server 2008 R2. Win2003 R2 was first generation DFSR, Win2008 was second generation, and Win2008 R2 is third generation; if you are serious about performance you must get to 2008 R2.

Hardware tuning

Use 64-bit OS with as much RAM as possibleon hubs - DFSR can become bound by RAM availability on busy hub servers, especially when using the registry performance values above. There is absolutely no reason to run a 32-bit file server in this day and age, and with the coming of Windows Server 2008 R2, it’s no longer possible. For spoke servers that tend to have far less load, you can cut more corners of course; the ten-user sales team in Hicksville doesn’t need 16GB of RAM in their file server.

As a side note, customers periodically open cases to report “memory leaks” in DFSR. What we discuss is that DFSR intentionally caches as much RAM as it can get its hands on – really though, it’s the ESE (Jet) database doing this. So the idler other processes on a DFSR server are, the more memory a DFSR process will be able to gobble up. You can see the same behavior in LSASS’s database on DC’s.

Use the fastest disk subsystem you can affordon hubs - Much of DFSR will be disk bound - especially in staging and RDC operations - so high disk throughput will dramatically lower bottlenecks; this is especially true on hub servers. As always, a disk queue length greater than 2 in PerfMon is in indication of an over-used or under-powered disk subsystem. Talk to your hardware vendors about the performance and cost differences of SATA, SCSI, and FC. Don’t forget about reliability too – I have a job here for life thanks to all the customers that use the least expensive, off-brand, no warranty, low parity, practically consumer-grade iSCSI products they can find. You get what you pay for and ultimately your users do not care about anything but their data. The OS is just a thing to make applications access files so that the business can make money. Someday the Linux desktop folks will figure this out and get some applications; then we may actually be in trouble here.

If using iSCSI, make sure you have redundant network paths to the disks, using multiple switches and NIC’s. We have had quite a few cases lately of no fault tolerance iSCSI configs that would go down for hours in the middle of DFSR updating the database and transaction logs, and the results were obviously not pretty.

Use reliable networks - They don't necessarily have to be fast, but they do need to stay up. Many DFSR performance issues are caused by using old network card drivers, using malfunctioning "Scalable Network" (TCP offload, RSS, etc.) settings, or using defective WANs. Network card vendors release frequent driver updates to increase performance and resolve problems; just like Windows service packs, the drivers should be installed to improve reliability and performance. Companies often deploy cost saving WAN solutions (with VPN tunnels, frame relay circuits, etc.) that in the end cost the company more in lost productivity than they ever saved in monthly expense. DFSR - like all RPC applications - is sensitive to constant network instability.
Review our performance tuning guides– For much more detail on squeezing performance out of your hardware, including network, storage, and the rest, review:
- Performance Tuning Guidelines for Windows Server 2008 R2
- Performance Tuning Guidelines for Windows Server 2008

And that’s it.

- Ned “fork” Pyle

Image may be NSFW.
Clik here to view.

↧

Son of SPA: AD Data Collector Sets in Win2008 and beyond

June 8, 2010, 6:51 am

≫ Next: Friday Mail Sack: Newfie from the Grave Edition

≪ Previous: Tuning replication performance in DFSR (especially on Win2008 R2)

Hello, David Everett here again. This time I’m going to cover configuration and management of Active Directory Diagnostics Data Collector Sets. Data Collector Sets are the next generation of a utility called Server Performance Advisor (SPA).

Prior to Windows Server 2008, troubleshooting Active Directory performance issues often required the installation of SPA. SPA is helpful because the Active Directory data set collects performance data and it generates XML based diagnostic reports that make analyzing AD performance issues easier by identifying the IP addresses of the highest volume callers and the type of network traffic that is placing the most load on the CPU. A screen shot of SPA is shown here with the Active Directory data set selected.

Image may be NSFW.
Clik here to view.

Those who came to rely upon this tool will be happy to know its functionality has been built into Windows Server 2008 and Windows Server 2008 R2.

This performance feature is located in the Server Manager snap-in under the Diagnostics node and when the Active Directory Domain Services Role is installed the Active Directory Diagnostics data collector set is automatically created under System as shown here. It can also be accessed by running “Perfmon” from the RUN command.

Image may be NSFW.
Clik here to view.

Like SPA, the Active Directory Diagnostics data collector set runs for a default of 5 minutes. This duration period cannot be modified for the built-in collector. However, the collection can be stopped manually by clicking the Stop button or from the command line. If reducing or increasing the time that a data collector set runs is required, and manually stopping the collection is not desirable, then see How to Create a User Defined Data Collection Set below. Like SPA, the data is stored under %systemdrive%\perflogs, only now it is under the \ADDS folder and when a data collection is run it creates a new subfolder called YYYYMMDD-#### where YYYY = Year, MM = Month and DD=Day and #### starts with 0001.

Once the data collection completes the report is generated on the fly and is ready for review under the Reports node.

Just as SPA could be managed from the command line with spacmd.exe, data collector sets can also be managed from the command line.

How to gather Active Directory Diagnostics from the command line

To START a collection of data from the command line issue this command from an elevated command prompt:

logman start “system\Active Directory Diagnostics” -ets

To STOP the collection of data before the default 5 minutes, issue this command:

logman stop “system\Active Directory Diagnostics” -ets
NOTE: To gather data from remote systems just add “-s servername” to the commands above like this:
logman -s servername start “system\Active Directory Diagnostics” -ets
logman -s servername stop “system\Active Directory Diagnostics” -ets

This command will also work if the target is Server Core. If you cannot connect using Server Manager you can view the report by connecting from another computer to the C$ admin share and open the report.html file under \\servername\C$\PerfLogs\ADDS\YYYYMMDD-000#.

See LaNae’s blog post on How to Enable Remote Administration of Server Core via MMC using NETSH to open the necessary firewall ports.

In the event you need a Data Collection set run for a shorter or longer period of time, or if some other default setting is not to your liking you can create a User Defined Data Collector Set using the Active Directory Diagnostics collector set as a template.

NOTE: Increasing the duration that a data collection set runs will require more time for the data to be converted and could increase load on CPU, memory and disk.

Once your customized Data Collector Set is defined to your liking you can export the information to an XML file and import it to any server you wish using Server Manager or logman.exe

How to Create a User Defined Data Collection Set

Open Server Manager on a Full version of Windows Server 2008 or later.
Expand Diagnostics> Reliability and Performance> Data Collector Sets .
Right-click User Defined and select New> Data Collector Set.
Type in a name like Active Directory Diagnostics and leave the default selection of Create from a template (Recommended) selected and click Next.
Select Active Directory Diagnostics from the list of templates and click Next and follow the Wizard prompts making any changes you think are necessary.
Right-click the new User Defined data collector set and view the Properties.
To change the run time, modify the Overall Duration settings in the Stop Condition tab and click OK to apply the changes.

Once the settings have been configured to your liking you can run this directly from Server Manager or you can export this and deploy it to specific DCs.

Deploying a User Defined Data Collection Set

In Server Manager on a Full version of Windows Server 2008 or later

1. Expand Diagnostics> Reliability and Performance> Data Collector Sets> User Defined
2. Right-click the newly created data collector set and select Save Template…

From the command line

1. Enumerate all User Defined data collector sets
logman query
NOTE: If running this from a remote computer the command add “-s servername” to target the remote server
logman -s servername query
2. Export the desired collection set
logman export -n “Active Directory Diagnostics” -xml addiag.xml
3. Import the collection set to the target server.
logman import -n “Active Directory Diagnostics” -xml addiag.xml
NOTE: If you get the error below then there’s an SDDL string in the XML file between the <Security></Security> tags that is not correct. This can happen if you export the Active Directory Diagnostics collector set under System. To correct this, remove everything between <Security></Security> tags in the XML file.
Error:
This security ID may not be assigned as the owner of this object.
4. Verify the collector set is installed
logman query
5. Now that the data collector set is imported you’re ready to gather data. See How to gather Active Directory Diagnostics from the command line above to do this from the command line.

Once you’ve gathered your data, you will have these interesting and useful reports to aid in your troubleshooting and server performance trending:

Image may be NSFW.
Clik here to view.

In short, all the goodness of SPA is now integrated into the operating system, not requiring an install or reboot. Follow the steps above, and you'll be on your way to gathering and analyzing lots of performance goo.

David “highly excitable” Everett

Image may be NSFW.
Clik here to view.

↧

Friday Mail Sack: Newfie from the Grave Edition

July 30, 2010, 6:01 pm

≫ Next: New DNS and AD DS BPA’s released (or: the most accurate list of DNS recommendations you will ever find from Microsoft)

≪ Previous: Son of SPA: AD Data Collector Sets in Win2008 and beyond

Heya, Ned here again. Since this another of those catch up mail sacks, there’s plenty of interesting stuff to discuss. Today we talk NSPI, DFSR, USMT, NT 4.0 (!!!), Win2008/R2 AD upgrades, Black Hat 2010, and Irish people who live on icebergs.

Faith and Begorrah!

Question

A vendor told me that I need to follow KB2019948 to raise the number of “NSPI max sessions per user” from 50 to 10,000 for their product to work. Am I setting myself up for failure?

Answer

Starting in Windows Server 2008 global catalogs are limited to 50 concurrent NSPI connections per user from messaging applications. That is because previous experience with letting apps use unlimited connections has been unpleasant. :) So when your vendor tells you to do this, they are putting you in the position where your DC’s will be allocating a huge number of memory pages to handle what amounts to a denial of service attack caused by a poorly written app that does not know how to re-use sessions correctly.

We wrote an article you can use to confirm this is your issue (BlackBerry Enterprise Server currently does this and yikes, Outlook 2007 did at some point too! There are probably others):

949469 NSPI connections to a Windows 2008-based domain controller may cause MAPI client applications to fail with an error code: "MAPI_E_LOGON_FAILED"
http://support.microsoft.com/default.aspx?scid=kb;EN-US;949469

The real answer is to fix the calling application so that it doesn’t behave this way. As a grotesque bandage, you can use the registry change on your GC’s. Make sure these DC’s are x64 OS and not memory bound before you start, as it’s likely to hurt. Try raising the value in increments before going to something astronomical like 10,000 – it may be that significantly fewer are needed per user and the vendor was pulling that number out of their butt. It’s not like they will be the ones on the phone with you all night when the DC tanks, right?

Question

I have recently started deploying Windows Server 2008 R2 as part of a large DFSR infrastructure. When I use the DFS Management (DFSMGMT.MSC) snap-in on the old Win2008 and Win2003 servers to examine my RG’s, the new RG’s don’t show up. Even when I select “Add replication groups to display” and hit the “Show replication groups” button I don’t see the new RG’s. What’s up?

Answer

We have had some changes in the DFSMGMT snap-in that intentionally lead to behaviors like these. For example:

Here’s Win2008 R2:

Image may be NSFW.
Clik here to view.

and here’s Win2003 R2:

Image may be NSFW.
Clik here to view.

See the difference? The missing RG names gives a clue. :)

This is because the msDFSR-Version attribute on the RG gets set to “3.0” when creating an RG with clustered memberships or an RG containing read-only memberships. Since a Win2003 or Win2008 server cannot correctly manage those new model RG’s, their snap-in is not allowed to see it.

Image may be NSFW.
Clik here to view.

In both cases this is only at creation time; if you go back later and do stuff with cluster or RO, then the version may not necessarily be updated and you can end up with 2003/2008 seeing stuff they cannot manage. For that reason I recommend you avoid managing DFSR with anything but the latest DFSMGMT.MSC. The snap-ins just can’t really coexist effectively. There’s never likely to be a backport because – why bother? The only way to have the problem is to already have the solution.

Question

Is there a way with USMT 4.0 to take a bunch of files scattered around the computer and put them into one central destination folder during loadstate? For example, PST files?

Answer

Sure thing, USMT supports a concept called “rerouting” that relies on an XML element called “locationModify”. Here’s an example:

<migration urlid="<a href="http://www.microsoft.com/migration/1.0/migxmlext/pstconsolidate">
<component type="Documents" context="System">
    <displayName>All .pst files to a single folder</displayName>
    <role role="Data">
      <rules>
        <include>
          <objectSet>
            <script>MigXmlHelper.GenerateDrivePatterns ("* [*.pst]", "Fixed")</script>
          </objectSet>
        </include>
       
        <locationModify script="MigXmlHelper.Move('C:\PSTFiles')">
          <objectSet>
            <script>MigXmlHelper.GenerateDrivePatterns ("* [*.pst]", "Fixed")</script>
          </objectSet>
        </locationModify>
      </rules>
    </role>
</component>
</migration>

The <locationModify> element allows you to choose from the MigXmlHelpers of RelativeMove, Move, and ExactMove. Move is typically the best option as it just preserves the old source folder structure under the new parent folder to which you redirected . ExactMove is less desirable as it will flatten out the source directory structure, which means you then need to explore the <merge> element and decide how you want to handle conflicts. Those could involve various levels of precedence (where some files will be overwritten permanently) or simply renaming files with (1), (2), etc added to the tail. Pretty gross. I don’t recommend it and your users will not appreciate it. RelativeMove allows you to take from one known spot in the scanstate and move to another new known spot in the loadstate.

Question

I’m running into some weird issues with pre-seeding DFSR using robocopy with Win2008 and Win2008 R2, even when following your instructions from an old post. It looks like my hashes are not matching as I’m seeing a lot of conflicts. I also remember you saying that there will be a new article on pre-seeding coming?

Answer

1. Make sure you install these QFE version that fixes several problems with ACL’s and other elements not correctly copying in 2008/2008R2 – all file elements are used by DFSR to calculate the SHA-1 hash, so anything being different (including security) will conflict the file:

973776 The security configuration information, such as the ACL, is not copied if a backup operator uses the Robocopy.exe utility together with the /B option to copy a file on a computer that is running Windows Vista or Windows Server 2008
http://support.microsoft.com/default.aspx?scid=kb;EN-US;973776
979808 "Robocopy /B" does not copy the security information such as ACL in Windows 7 and in Windows Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;EN-US;979808

2. Here’s my recommended robocopy syntax. You will want to ensure that the base folder (where copying from and to) have the same security and inheritance settings prior to copying, of course.

Image may be NSFW.
Clik here to view.

3. If you are using Windows Server 2008 R2 (or have a Win7 computer lying around), you can use the updated version of DFSRDIAG.EXE that supports the FILEHASH command. It will allow you to test and see if your pre-seeding was done correctly before continuing:

C:\>dfsrdiag.exe filehash
Command "FileHash" or "Hash" Help:
   Displays a hash value identical to that computed by the DFS Replication
   service for the specified file or folder
   Usage: DFSRDIAG FileHash </FilePath:filepath>
   </FilePath> or </Path>
     File full path name
     Example: /FilePath:d:\directory\filename.ext

It only works on a per-file basis, so it’s either for “spot checking” or you’d have to script it to crawl everything (probably overkill). So you could do your pre-seeding test, then use this to check how it went on some files:

dfsrdiag filehash /path:\\srv1\rf\somefile.txt
dfsrdiag filehash /path:\\srv2\rf\somefile.txt

If the hashes fit, you must acquit!

Still working on the full blog post, sorry. It’s big and requires a lot of repro and validation, just needs more time – but it had that nice screenshot for you. :)

Question

Can a Windows NT 4.0 member join a Windows Server 2008 R2 domain?
Can Windows7/2008 R2 join an NT 4.0 domain?
Can I create a two-way or outbound trust between an NT 4.0 PDC and Windows Server 2008 R2 PDCE?

Short Snarky Answer

Yes, but good grief, really!?!
No.
Heck no.

Long Helpful Answer

If you enable the AllowNt4Crypto Netlogon setting and all the other ridiculously insecure settings required for NT 4.0 below you will be good to go. At least until you get hacked due to using a 15 year old OS that has not gotten a security hotfix in half a decade.

823659 Client, service, and program incompatibilities that may occur when you modify security settings and user rights assignments
http://support.microsoft.com/default.aspx?scid=kb;EN-US;823659

942564 The Net Logon service on Windows Server 2008 and on Windows Server 2008 R2 domain controllers does not allow the use of older cryptography algorithms that are compatible with Windows NT 4.0 by default
http://support.microsoft.com/default.aspx?scid=kb;EN-US;942564
Windows 7 and 2008 R2 computers cannot join NT 4.0 domains due to fundamental security changes. No, this will not change. No, there is no workaround.
940268 Error message when you try to join a Windows Vista, Windows Server 2008, Windows 7, or Windows Server 2008 R2-based computer to a Windows NT 4.0 domain: "Logon failure: unknown user name or bad password"
http://support.microsoft.com/default.aspx?scid=kb;EN-US;940268
Windows Server 2008 R2 PDCE’s cannot create an outbound or two-way trusts to NT 4.0 due to fundamental security changes . We have a specific article in mind for this right now, but the KB942564 was updated to reflect this also. No, this will not change. No, there is no workaround.

The real solution here is to stop expending all this energy to be insecure and keep ancient systems running. You obviously have newer model OS’s in the environment, just go whole hog. Upgrade, migrate or toss your NT 4.0 environments. Windows 2000 support just ended, for goodness sake, and it was 5 years younger than NT 4.0! For every one customer that tells me they need an NT 4.0 domain for some application to run (which no one ever actually checks to see if that’s true, because they secretly know it is not true), the other nineteen admit that they just haven’t bothered out of sheer inertia.

Let me try this another way – go here: http://www.microsoft.com/technet/security/bulletin/summary.mspx. This is the list of all Microsoft security bulletins in the past seven years. For five of those years, NT 4.0 has not gotten a single hotfix. Windows 2000 – remember, not supported now either– has gotten 174 security updates in the past four years alone. If you think your NT 4.0 environment is not totally compromised, it’s only because you keep it locked in an underwater vault with piranha fish and you keep the servers turned off. It’s an OS based on using NTLM’s challenge response security, which people are still gleefully attacking with new vectors.

You need Kerberos.

Question

We use a lot of firewalls between network segments inside our environment. We have deployed DFSR and it works like a champ, replicating without issues. But when I try to gather a health report for a computer that is behind a firewall, it fails with an RPC error. My event log shows:

Error Event Source: DCOM
Event Category: None
Event ID: 10006
Date: 7/15/2010
Time: 2:51:52 PM
User: N/A
Computer: SRVBEHINDFIREWALL
Description: DCOM got error "The RPC server is unavailable."

Answer

If replication is working with the firewall but health reports are not, it sounds like DCOM/WMI traffic is being filtered out. Make sure the firewalls are not blocking or filtering the DCOM traffic specifically; a later model firewall that supports packet inspection may be deciding to block the DCOM types of traffic based on some rule. A double-sided network capture is how you will figure this out – the computer running MMC will connect remotely to DCOM over port 135, get back a response packet that (internally) states the remote port for subsequent connections, then the MMC will connect to that port for all subsequent conversations. If that port is blocked, no report.

For example here I connect to port 135 (DCOM/EPM), get a response packet that contains the new dynamic listening port to connect for DCOM – that port happens to be 55158 (but will differ every time). I then connect to that remote port in order to get a health diagnostic output using the IServerHealthReport call. If you create a double-sided network capture, you will likely see the first conversation fail, and if it succeeds, the subsequent conversation will be failing. Failing due the firewall dropping the packets and them never appearing on the remote host – that’s why you must use double-sided.

Image may be NSFW.
Clik here to view.
Click me

Question

I know USMT cannot migrate local printers, but can it migrate TCP-port connected printers?

Answer

No, and for the same reason: those printers are not mapped to a print server that can send you a device driver and they are (technically) also a local printer. Dirty secret time: USMT doesn’t really migrate network printers, it just migrates these two registry keys:

HKCU\Printers\Connections
HKCU\Printers\DevModes2

So if your printer is in those keys, USMT is win – and the only kind that live there are mapped network printers. When you first logon and access the printer on your newly restored computer, Windows will just download the driver for you and away you go. Considering that you are in the middle of this big migration, now would be a good time to get rid of these old (wrong?) ways of connecting printers. Windows 7 has plenty of options for printer deployment through group policy, group policy preferences, and you can even make the right printers appear based on the user’s location. For example, here’s what I see when I add a printer here at my desk – all I see are the printers in my little building on the nearest network. Not the ones across the street, not the ones I cannot use, not the ones I have no business seeing. Do this right and most users will only see printers within 50 feet of them. :)

Image may be NSFW.
Clik here to view.

To quote from the book of Bourdain: That does not suck.

Question

What are the best documents for planning, deploying, and completing a forest upgrade from Win2000/2003 to Win2008/2008R2? [Asked at least 10 times a week – Ned]

Answer

Here:

Upgrading Active Directory Domains to Windows Server 2008 and Windows Server 2008 R2 AD DS Domains
http://technet.microsoft.com/en-us/library/cc731188(WS.10).aspx

If you are planning a domain upgrade, this should be your new homepage until the operation is complete. It is fantastic documentation with checklists, guides, known issues, recommended hotfixes, and best practices. It’s the bee’s knees, the wasp’s elbows, and the caterpillar's feets.

Image may be NSFW.
Clik here to view.

Moving on to other things not directly sack-related…

There are a couple of interesting takeaways from Black Hat US 2010 this week:

We announced our new Coordinated Vulnerability Disclosure process. Adobe is onboard already, hopefully more to come.
These folks claim they have a workable attack on Kerberos smart card logons. Except that we’ve had a way to prevent the attack for three years, starting in Vista using Strict KDC Validation – so that kinda takes the wind out of their sails. You can read more about how to make sure you are protected here and here and soon here. Pretty amazing also that this is the first time – that I’ve heard of, at least – in 11 years of MS Kerberos smart cards that anyone was talking attacks past the theoretical stage.
Of 102 topics, 10 are directly around Microsoft and Windows attacks. 48 are around web, java, and browser attacks. How much attention are you giving your end-to-end web security?
10 topics were also around attacking iPhones and Google apps. How much attention are you giving those products in your environment? They are now as interesting to penetrate as all of Windows, according to Black Hat.
5 topics on cloud computing attacks. Look for that number to double next year, and then double again the year after. Bet on it, buddy.

Finally, remember my old boss Mike O’Reilly? Yes, that guy that made the Keebler tree and who was the manager in charge of this blog and whom I worked with for 6 years. Out of the blue he sends me this email today – using his caveman Newfie mental gymnastics:

Ned,

I never ever read the Askds blog when I worked there. I was reading it today and just realized that you are funny.

What a guy. Have a nice weekend folks,

- Ned “I have 3 bosses, Bob” Pyle

Image may be NSFW.
Clik here to view.

↧

New DNS and AD DS BPA’s released (or: the most accurate list of DNS recommendations you will ever find from Microsoft)

August 2, 2010, 2:22 pm

≫ Next: Multi-NIC File Server Dissection

≪ Previous: Friday Mail Sack: Newfie from the Grave Edition

Hi folks, Ned here again. We’ve released another wave of Best Practices Analyzer rules for Windows Server 2008 / R2, and if you care about Directory Services you care about these:

AD DS rules update
Info:Update for the AD DS Best Practices Analyzer rules in Windows Server 2008 R2
Download:Rules Update for Active Directory Domain Services Best Practice Analyzer for Windows Server 2008 R2 x64 Editions (KB980360)

This update BPA for Active Directory Domain Services include seven rules changes and updates, some of which are well known but a few that are not.

DNS Analyzer 2.0
Operation Info: Best Practices Analyzer for Domain Name System – Ops
Configuration info: Best Practices Analyzer for Domain Name System - Config
Download:Microsoft DNS (Domain Name System) Model for Microsoft Baseline Configuration Analyzer 2.0

Remember when – a few weeks back– I wrote about recommended DNS configuration and I promised more info? Well here it is, in all its glory. Despite what you might have heard, misheard, remembered, or argued about, this is the official recommended list, written by the Product Group and appended/vetted/munged by Support. Which includes:

Awww yeaaaahhh… just memorize that and you’ll win any "Microsoft recommended DNS" bar bets you can imagine. That’s the cool thing about this ongoing BPA project: not only do you get a tool that will check your work in later OS versions, but the valid documentation gets centralized.

- Ned “Arren hates cowboys” Pyle

Image may be NSFW.
Clik here to view.

↧

Multi-NIC File Server Dissection

September 3, 2010, 8:33 am

≫ Next: What does DCDIAG actually… do?

≪ Previous: New DNS and AD DS BPA’s released (or: the most accurate list of DNS recommendations you will ever find from Microsoft)

Ned here. Our friend and colleague Jose Barreto from the File Server development team has posted a very interesting article around multiple NIC usage on Win2008/R2 file servers. Here's the intro:

When you set up a File Server, there are advantages to configuring multiple Network Interface Cards (NICs). However, there are many options to consider depending on how your network and services are laid out. Since networking (along with storage) is one of the most common bottlenecks in a file server deployment, this is a topic worth investigating.

Throughout this blog post, we will look into different configurations for Windows Server 2008 (and 2008 R2) where a file server uses multiple NICs. Next, we’ll describe how the behavior of the SMB client can help distribute the load for a file server with multiple NICs. We will also discuss SMB2 Durability and how it can recover from certain network failure in configuration where multiple network paths between clients and servers are available. Finally, we will look closely into the configuration of a Clustered File Server with multiple client-facing NICs.

I highly recommend giving the whole thing a read if you are interested in increasing file server throughput and reliability on the network in a recommend fashion.

http://blogs.technet.com/b/josebda/archive/2010/09/03/using-the-multiple-nics-of-your-file-server-running-windows-server-2008-and-2008-r2.aspx

- Ned "I am team Edward" Pyle

Image may be NSFW.
Clik here to view.

↧

What does DCDIAG actually… do?

March 22, 2011, 1:56 pm

≫ Next: What is the Impact of Upgrading the Domain or Forest Functional Level?

≪ Previous: Multi-NIC File Server Dissection

Hi folks, Ned here again. I recently wrote a KB article about some expected DCDIAG.EXE behaviors. This required reviewing DCDIAG.EXE as I wasn’t finding anything deep in TechNet about the “Services” test that had my interest. By the time I was done, I had found a dozen other test behaviors I had never known existed. While we have documented the version of DCDIAG that shipped with Windows Server 2008 – sometimes with excellent specificity, like Justin Hall’s article about the DNS tests– mostly it’s a black box and you only find out what it tests when the test fails. Oh, we have help of course: just run DCDIAG /? to see it. But it’s help written by developers. Meaning you get wording like this:

Advertising
Checks whether each DSA is advertising itself, and whether it is advertising itself as having the capabilities of a DSA.

So, it checks each DSA (whatever that is) to see if it’s advertising (whatever that means). The use of an undefined acronym is an especially nice touch, as even within Microsoft, DSA could mean:

Digital Signature Algorithm
Dynamic Structure Array
Distributed Systems Architecture
Directory System Agent<-- it’s this one
Directory Service Agent<-- except when it’s this one

Naturally, this brings out my particular brand of OCD. What follows is the result of my compulsion to understand. I’m not documenting every last switch in DCDIAG, just the tests. I am only documenting Windows Server 2008 R2 SP1 behavior – I have no idea where the source code is for the ancient Support Tools version of DCDIAG and you aren’t paying me enough here to find it :-). The Windows Server 2008 RTM through Windows Server 2008 R2 SP1 versions are nearly identical except for bug fixes:

KB2401600 The Dcdiag.exe VerifyReferences test fails on an RODC that is running Windows Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;en-US;2401600
KB979294 The Dcdiag.exe tool takes a long time to run in Windows Server 2008 R2 and in Windows 7
http://support.microsoft.com/default.aspx?scid=kb;EN-US;979294
KB978387 FIX: The connectivity test that is run by the Dcdiag.exe tool fails together with error code 0x621
http://support.microsoft.com/default.aspx?scid=kb;EN-US;978387

Everything I describe below you can discover and confirm yourself with careful examination of network captures and logging, to include the public functions being used– but why walk when you can ride? Using /v can also provide considerable details on some tests. No internal source code is described nor do I show any special hidden functionality.

For info on all the network protocols I list out – or if you run into network errors when using DCDIAG – see Service overview and network port requirements for the Windows Server system. I went pretty link-happy in general in this post to help people using it as a reference; that way if you just look at your one little test it has all the info you need. I don’t always call out name resolution being tested because it is implicit; it’s also testing TCP, UDP, and IP.

Finally: this post is more of a reference than my usual lighthearted fare. Do not operate heavy machinery while reading.

Initial Required Tests

This tests general connectivity and responsiveness of a DC, to include:

Verifying the DC can be located in DNS.
Verifying the DC responds to ICMP pings.
Verifying the DC allows LDAP connectivity by binding to the instance.
Verifying the DC allows binding to the AD RPC interface using the DsBindWithCred function.

The DNS test can be satisfied out of the client cache so restarting the DNS client service locally is advisable when running DCDIAG to guarantee a full test of name resolution. For example:

Net stop "dns client" & net start "dns client" & dcdiag /test:verifyreplicas /s:DC-01

The initial tests cannot be skipped.

The initial tests use ICMP, LDAP, DNS, and RPC on the network.

Editorial note: Blocking ICMP will prevent DCDIAG from working. While blocking ICMP is highly recommended at the Internet-edge of your network, internally blocking ICMP traffic mainly just leads to administrative headaches like breaking legacy group policy, breaking black hole router detection (or leading to highly inefficient MTU sizes due to lack of a discovery option), and breaking troubleshooting tools like ping.exe or tracert.exe. It creates an illusion of security; there are a great many other easy ways for a malicious internal user to locate computers.

Advertising

This test validates that the public DsGetDcName function used by computers to locate domain controllers will correctly locate any DCs specified with in the command line with the /s, /a, or /e parameter. It checks that the server successfully reports itself with DS_Flags for:

DC
LDAP server
Writable or Read-Only DC
KDC
Time Server
GC or not (and if claiming to be a GC, if the is GC ready to respond to requests )

Note that “advertising” is not the same as “working”. For instance, if the KDC service is stopped the Advertising test will fail since the flag returned from DsGetDcName will not include KDC. But if port 88 over TCP and UDP are blocked on a firewall, the Advertising test will pass – even though the KDC is not going to be able to answer requests for Kerberos tickets.

This test is done using RPC over SMB (using a Netlogon named pipe) to the DC plus LDAP to locate the DCs site information.

CheckSDRefDom

This test validates that your application partition cross reference objects (located in “cn=partitions,cn=configuration,dc=<forest root domain>”) contain the correct domain names in their msDS-SDReferenceDomain attributes. The test uses LDAP.

I find no history of anyone ever seeing the error message that can be displayed here.

The test uses LDAP.

CheckSecurityError

This test does a variety of checks around the security components of a DC like Kerberos. For it to be more specifically useful you should provide /replsource:<some partner DC> as the default checks are not as comprehensive.

This test:

Validates that at least one KDC is online for each domain and they are reachable (first in the same site, then anywhere in the domain if that fails)
Checks if packet fragmentation of Kerberos over UDP might be an issue based on current MTU size by sending non-fragmenting ICMP packets
Checks if the DC’s computer account exists in AD, if it’s within the default “Domain Controllers” OU, if it has the correct UserAccountControl flags for DCs, that the correct ServerReference attributes are set, and if the minimum Service Principal Names are set
Validates that the DCs computer object has replicated to other DCs
Validates that there are no replication or KCC connection issues for connected partners by querying the function DsReplicaGetInfo to get any security-related errors

When the /replsource is added, a few more tests happen. The partner is checked for all of the above also, then:

Time skew is calculated between the servers to verify it is less than 300 seconds for Kerberos. It does not check the Kerberos policy to see if allowed skew has been modified
Permissions are checked on all the naming contexts (such as Schema, Configuration, etc.) on the source DC to validate that replication and connectivity will work between DCs
Connectivity is checked to validate that the user running DCDIAG (and therefore in theory, all other users) can connect to and read the SYSVOL and NETLOGON shares without any security errors. It also checks IPC$, but inability to connect there would have broken many earlier tests
The "Access this computer from the network" privilege on the DC is checked to verify it is held by Administrators, Authenticated Users, and Everyone groups
The DC's computer object is checked to ensure it is the latest version on the DCs. This is done to prove replication convergence since a very stale DC might lead to security issues for users, problems with the DCs own computer account password, or secure channels to other servers. It checks versions, USNs, originating servers, and timestamps

These tests are performed using LDAP, RPC, RPC over SMB, and ICMP.

Connectivity

No matter what you specify for tests, this always runs as part of Initial Required Tests.

CrossRefValidation

This test retrieves a list of naming contexts (located in “cn=partitions,cn=configuration,dc=<forest root domain>”) with their cross references and then validates them, similar to the CheckSDRefDom test above. It is looking at the nCName , dnsRoot, nETBIOSName, and systemFlags attributes to:

Make sure the names or DNs are not invalid or null
Confirm DNs are not otherwise mangled with CNF or 0ADEL (which happens during Conflict or Deletion operations)
Ensure the systemFlags are correct for that object
Call out any empty (orphaned) replica sets

The test uses LDAP.

CutoffServers

Tests the AD replication topology to ensure there are no DCs without working connection objects between partners. Any servers that cannot replicate inbound or outbound from any DCs are considered “cut off”. It uses the function DsReplicaSyncAll to do this which means this “test” actually triggers replication on the DCs so use with caution if you are the owner of crud WAN links that you keep clean with schedules, and certainly consider this before using /e.

This test is rather misleading in its help description; if it cannot contact a server that is actually unavailable to LDAP on the network then it gives no error or test results, even if the /v parameter is specified. You have to notice that there is no series of “analyzing the alive system replication topology” or “performing upstream (of target) analysis” messages being printed for a cutoff server. However, the Connectivity test will fail if the server is unreachable so it’s a wash.

The test uses RPC.

DcPromo

The DCpromo test is one of the two oddballs in DCDIAG (the other is ‘DNS’). It is designed to test how well a DCPROMO would proceed if you were to run it on the server where DCDIAG is launched. It also has a number of required switches for each kind of promotion operation. All of the tests are against the server specified first in the client DNS settings. It tests:

If at least one network adapter has a primary DNS server set
If you would have a disjoint namespace based on the DNS suffix
That the proposed authoritative DNS zone can be contacted
If dynamic DNS updates are possible for the server’s A record. It checks both the setting on the authoritative DNS zone as well as the client registry configuration of DnsUpdateOnAllAdapters and DisableDynamicUpdate
If an LDAP DClocator record (i.e. “_ldap ._tcp.dc._msdcs.<domain>”) is returned when querying for existing forests

The test uses DNS on the network.

DNS

This series of enterprise-wide DNS tests are already well documented here:

http://technet.microsoft.com/en-us/library/cc731968(WS.10).aspx

The tests use DNS, RPC, and WMI protocols.

FrsEvent

This test validates the File Replication Service’s health by reading (and printing, if using /v) FRS event log warning and error entries from the past 24 hours. It’s possible this service won’t be running or installed on Windows Server 2008 or later if SYSVOL has been migrated to DFSR. On Windows Server 2008, some events may be misleading as they may refer to custom replica sets and not necessarily SYSVOL; on Windows Server 2008 R2, however, FRS can be used for SYSVOL only.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

DFSREvent

This test validates the Distributed File System Replication service’s health by reading (and printing, if using /v) DFSR event log warning and error entries from the past 24 hours. It’s possible this service won’t be running or installed on Windows Server 2008 if SYSVOL is still using FRS; on Windows Server 2008 R2 the service is always present on DCs. While this ostensibly tests DFSR-enabled SYSVOL, any errors within custom DFSR replication groups would also appear here, naturally.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

SysVolCheck

This test reads the DCs Netlogon SysvolReady registry key to validate that SYSVOL is being advertised:

HKEY_Local_Machine\System\CurrentControlSet\Services\Netlogon\Parameters
SysvolReady=1

The value name has to exist with a value of 1 to pass the test. This test will work with either FRS or DFSR-replicated SYSVOLs. It doesn’t check if the SYSVOL and NELOGON shares are actually accessible, though (CheckSecurityError does that).

The test uses RPC over SMB (through a named pipe to WinReg).

LocatorCheck

This test validates that DCLocator queries return the five “capabilities” that any DC must know of to operate correctly.

If not hosting one, the DC will refer to another DC that can satisfy the request; this means that you must carefully examine this under /v to make sure a server you thought was supposed to be holding a capability actually is correctly returned. If no DC answers or if the queries return errors then the test will fail.

The tests use RPC over SMB with the standard DsGetDcName DCLocator queries.

Intersite

This test uses Directory Replication Service (DRS) functions to check for conditions that would prevent inter-site AD replication within a specific site or all sites:

Locates and connect to the Intersite Topology Generators (ISTG)
Locates and connect to the bridgehead servers
Reports back any replication failures after triggering a replication
Validates that all DCs within sites with inbound connections to this site are available
Checks the KCC values for “IntersiteFailuresAllowed” and “MaxFailureTimeForIntersiteLink” overrides within the registry key:

KEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

You must be careful with this test’s command-line arguments and always provide /a or /e. Not providing a site means that the test runs but skips actually testing anything (you can see this under /v).

All tests use RPC over the network to test the replication aspects and will make registry connections (RPC over SMB to WinReg) to check for those NTDS settings override entries. LDAP is also used to locate connection info.

KccEvent

This test queries the Knowledge Consistency Checker on a DC for KCC errors and warnings generated in the Directory Services event log during the last 15 minutes. This 15 minute threshold is irrespective of the Repl topology update period (secs) registry value on the DC.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

KnowsOfRoleHolders

This test returns the DC's knowledge of the five Flexible Single Master Operation (FSMO) roles. The test does not inherently check all DCs knowledge for consistency, but using the /e parameter would provide data sufficient to allow comparison.

The test uses RPC to return DSListRoles within the Directory Replication Service (DRS) functions.

MachineAccount

This test checks if:

The DC's computer account exists in AD
It’s within the Domain Controllers OU
It has the correct UserAccountControl flags for DCs
The correct ServerReference attributes are set
The minimum Service Principal Names are set. For those paying close attention, this is identical to one test aspect of CheckSecurityError; this is because they use the same internal test

This test also mentions two repair options:

/RecreateMachineAccount will recreate a missing DC computer object. This is not a recommended fix as it does not recreate any child objects of a DC, such as FRS and DFSR subscriptions. The best practice is to use a valid SystemState backup to authoritatively restore the DC's deleted object and child objects. If you do use this /RecreateMachineAccount option then the DC should then be gracefully demoted and promoted to repair all the missing relationships
/FixMachineAccount will add the UserAccountControl flags to a DCs computer object for “TRUSTED_FOR_DELEGATION” and “SERVER_TRUST_ACCOUNT”. It’s safe to use as a DC missing those bit flags will not function and it does not remove other bit flags present. Using this repair option is preferred over trying to set these flags yourself through ADSIEDIT or other LDAP editors

This test uses LDAP and RPC over SMB.

NCSecDesc

This test checks permissions on all the naming contexts (such as Schema, Configuration, etc.) on the source DC to validate that replication and connectivity will work between DCs. It makes sure that “Enterprise Domain Controllers” and “Administrators” groups have the correct minimum permissions. This is the same performed test within CheckSecurityError.

The test uses LDAP.

NetLogons

This test is designed to:

Validate that the user running DCDIAG (and therefore in theory, all other users) can connect to and read the SYSVOL and NETLOGON shares without any security errors. It also checks IPC$, but inability to connect there would have broken many earlier tests
Verify that the Administrators, Authenticated Users, and Everyone group have the “access this computer from the network” privilege on the DC. If not, you’d see a ton of other errors here though, naturally

Both of these tests are also performed by CheckSecurityError.

The tests use SMB and RPC over SMB (through named pipes).

ObjectsReplicated

This test verifies that replication of a few key objects and attributes has occurred and displays up-to-dateness info if replication is stale. By default the two objects validated are:

The ”CN=NTDS Settings” object of each DC exists up to date on all other DCs.
The “CN=<DC name>” object of each DC exists up to date on all other DCs.

This test is not valuable unless run with /e or /a as it just asks the DC about itself when those are not specified. Using /v will give more details on objects thought to be stale based on version.

You can also specify arbitrary objects to test with /objectdn /n, which can be useful after creating a “canary” object to validate replication.

The tests are done using RPC with Directory Replication Service (DRS) functions.

OutboundSecureChannels

This test is designed to check external trusts. It will not run by default and will fail even when provided correct /testdomain parameters, validating the secure channel with NLTEST.EXE, and using a working external trust. It does state that the secure channel is valid but then mistakenly reports that there are no working trust objects. I’ll update this post when I find out more. This test should not be used.

RegisterLocatorDnsCheck

Validates many of the same aspects as the Dcpromo test. It requires the /dnsdomain switch to specify a domain that would be the target of registration; this can be a different domain than the current primary one. It specifically verifies:

If at least one network adapter has a primary DNS server set.
If you would have a disjoint namespace based on the DNS suffix
That the proposed authoritative DNS zone can be contacted
If dynamic DNS updates are possible for the server’s A record. It checks both the setting on the authoritative DNS zone as well as the client registry configuration of DnsUpdateOnAllAdapters and DisableDynamicUpdate
If an LDAP DClocator record (i.e. “_ldap ._tcp.dc._msdcs.<domain>”) is returned when querying for existing forests
That the authoritative DNS zone can be contacted

The test uses DNS on the network.

Replications

This test checks all AD replication connection objects for all naming contexts on specified DC(s) to see:

If the last replication attempted was successful or returned an error
If replication is disabled
If replication latency is more than 12 hours

The tests are done with LDAP and RPC using DsReplicaGetInfo.

RidManager

This test validates that the RID Master FSMO role holder:

Can be located and contacted through a DsBind
Has valid RID pool values

This role must be online and accessible for DCs to be able to create security principals (users, computers, and groups) as well as for further DCs to be promoted within a domain.

The test uses LDAP and RPC.

Services

This test validates that various AD-dependent services are running, accessible, and set to specific start types:

RPCSS - Start Automatically – Runs in Shared Process
EVENTSYSTEM - Start Automatically - Runs in Shared Process
DNSCACHE - Start Automatically - Runs in Shared Process
NTFRS - Start Automatically - Runs in Own Process (if domain functional level is less than Windows Server 2008. Does not trigger on SYSVOL being replicated by FRS)
ISMSERV - Start Automatically - Runs in Shared Process
KDC - Start Automatically - Runs in Shared Process
SAMSS - Start Automatically - Runs in Shared Process
SERVER - Start Automatically - Runs in Shared Process
WORKSTATION - Start Automatically - Runs in Shared Process
W32TIME - Start Manually or Automatically - Runs in Shared Process
NETLOGON - Start Automatically - Runs in Shared Process

(If target is Windows Server 2008 or later)

NTDS - Start Automatically - Runs in Shared Process
DFSR - Start Automatically - Runs in Own Process (if domain functional level is Windows Server 2008 or greater. Does not trigger on SYSVOL being replicated by DFSR)

(If using SMTP-based AD replication)

IISADMIN - Start Automatically - Runs in Shared Process
SMTPSVC - Start Automatically - Runs in Shared Process

These are the “real” service names listed in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services. If this test is specified when targeting Windows Server 2003 DCs it is expected to fail on RpcSs. See KB2512643.

The test uses RPC and the Service Control Manager remote protocol.

SystemLog

This test validates the System Event Log’s health by reading and printing entries from the past 60 minutes (stopping at computer startup timestamp if less than 60 minutes). Errors and warnings will be printed, with no evaluation done of them being expected or not – this is left to the DCDIAG user.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

Topology

This test checks that a server has a fully-connected AD replication topology. This test must be explicitly run. It checks:

If automatic intra-site topology generation is disabled
If automatic inter-site topology generation is disabled
For disconnected topologies (i.e. missing connection objects), both upstream and downstream from each reference DC

The test uses DsReplicaSyncAll with the flag of DS_REPSYNCALL_DO_NOT_SYNC. Meaning that the test analyzes and validates replication topology without actually replicating changes. The test does not validate the availability of replication partners – having a partner offline will not cause failures in this test. This does not test if the schedule is completely closed, preventing replication; to see those active replication results, use tests Replications or CutoffServers.

The test uses RPC and LDAP.

VerifyEnterpriseReferences

This test verifies computer reference attributes for all DCs, including:

ServerReference attribute correct for a DC on cn=<DC name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
ServerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
frsComputerReference attribute correct for a DC site object on cn=domain system volume (sysvol share),cn=ntfrs subscriptions,cn=<DC Name>,ou=domain controllers,DC=<domain>
frsComputerReferenceBL attribute correct for a DC object on cn=<DC Name>,cn=domain system volume (sysvol share),cn=file replication service,cn=system,dc=<domain>
hasMasterNCs attribute correct for a DC on cn=ntds settings,cn=<DC Name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
nCName attribute correct for a partition at cn=<partition name>,cn=partitions,cn=configuration,dc=<domain>
msDFSR-ComputerReference attribute correct for a DC DFSR replication object on cn=<DC Name>,cn=topology,cn=domain system volume,cn=dfsr-blobalsettings,cn=system,dc=<domain>
msDFSR-ComputerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>

Note that the two DFSR tests are only performed if domain functional level is Windows Server 2008 or higher. This means there will be an expected failure if DFSR has not been migrated to SYSVOL as the test does not actually care if FRS is still in use.

The test uses LDAP. The DCS are not all individually contacted, only the specified DCs are contacted.

VerifyReferences

This test verifies computer reference attributes for a single DC, including:

ServerReference attribute correct for a DC on cn=<DC name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
ServerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
frsComputerReference attribute correct for a DC site object on cn=domain system volume (sysvol share),cn=ntfrs subscriptions,cn=<DC Name>,ou=domain controllers,DC=<domain>
frsComputerReferenceBL attribute correct for a DC object on cn=<DC Name>,cn=domain system volume (sysvol share),cn=file replication service,cn=system,dc=<domain>
msDFSR-ComputerReference attribute correct for a DC DFSR replication object on cn=<DC Name>,cn=topology,cn=domain system volume,cn=dfsr-blobalsettings,cn=system,dc=<domain>
msDFSR-ComputerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>

This is similar to the VerifyEnterpriseRefrences test except that it does not check partition cross references or all other DC objects.

The test uses LDAP.

VerifyReplicas

This test verifies that the specified server does indeed host the application partitions specified by its crossref attributes in the partitions container. It operates exactly like CheckSDRefDom except that it does not show output data and validates hosting.

This test uses LDAP.

That’s all folks.

- Ned “that was seriously un-fun to write” Pyle

Image may be NSFW.
Clik here to view.

↧