Quantcast
Channel: Ask the Directory Services Team
Viewing all 66 articles
Browse latest View live

What is the Impact of Upgrading the Domain or Forest Functional Level?

$
0
0

Hello all, Jonathan here again. Today, I want to address a question that we see regularly. As customers upgrade Active Directory, and they inevitably reach the point where they are ready to change the Domain or Forest Functional Level, they sometimes become fraught. Why is this necessary? What does this mean? What’s going to happen? How can this change be undone?

What Does That Button Do?

Before these question can be properly addressed, if must first be understood exactly what purposes the Domain and Forest Functional Levels serve. Each new version of Active Directory on Windows Server incorporates new features that can only be taken advantage of when all domain controllers (DC) in either the domain or forest have been upgraded to the same version. For example, Windows Server 2008 R2 introduces the AD Recycle Bin, a feature that allows the Administrator to restore deleted objects from Active Directory. In order to support this new feature, changes were made in the way that delete operations are performed in Active Directory, changes that are only understood and adhered to by DCs running on Windows Server 2008 R2. In mixed domains, containing both Windows Server 2008 R2 DCs as well as DCs on earlier versions of Windows, the AD Recycle Bin experience would be inconsistent as deleted objects may or may not be recoverable depending on the DC on which the delete operation occurred. To prevent this, a mechanism is needed by which certain new features remain disabled until all DCs in the domain, or forest, have been upgraded to the minimum OS level needed to support them.

After upgrading all DCs in the domain, or forest, the Administrator is able to raise the Functional Level, and this Level acts as a flag informing the DCs, and other components as well, that certain features can now be enabled. You'll find a complete list of Active Directory features that have a dependency on the Domain or Forest Functional Level here:

Appendix of Functional Level Features
http://technet.microsoft.com/en-us/library/cc771132(WS.10).aspx

There are two important restrictions of the Domain or Forest Functional Level to understand, and once they are, these restrictions are obvious. Once the Functional Level has been upgraded, new DCs on running on downlevel versions of Windows Server cannot be added to the domain or forest. The problems that might arise when installing downlevel DCs become pronounced with new features that change the way objects are replicated (i.e. Linked Value Replication). To prevent these issues from arising, a new DC must be at the same level, or greater, than the functional level of the domain or forest.

The second restriction, for which there is a limited exception on Windows Server 2008 R2, is that once upgraded, the Domain or Forest Functional Level cannot later be downgraded. The only purpose that having such ability would serve would be so that downlevel DCs could be added to the domain. As has already been shown, this is generally a bad idea.

Starting in Windows Server 2008 R2, however, you do have a limited ability to lower the Domain or Forest Functional Levels. The Windows Server 2008 R2 Domain or Forest Functional level can be lowered to Windows Server 2008, and no lower, if and only if none of the Active Directory features that require a Windows Server 2008 R2 Functional Level has been activated. You can find details on this behavior - and how to revert the Domain or Forest Functional Level - here.

What Happens Next?

Another common question: what impact does changing the Domain or Forest Functional Level have on enterprise applications like Exchange or Lync, or on third party applications? First, new features that rely on the Functional Level are generally limited to Active Directory itself. For example, objects may replicate in a new and different way, aiding in the efficiency of replication or increasing the capabilities of the DCs. There are exceptions that have nothing to do with Active Directory, such as allowing NTFRS replacement by DFSR to replicate SYSVOL, but there is a dependency on the version of the operating system. Regardless, changing the Domain or Forest Functional Level should have no impact on an application that depends on Active Directory.

Let's fall back on a metaphor. Imagine that Active Directory is just a big room. You don't actually know what is in the room, but you do know that if you pass something into the room through a slot in the locked door you will get something returned to you that you could use. When you change the Domain or Forest Functional Level, what you can pass in through that slot does not change, and what is returned to you will continue to be what you expect to see. Perhaps some new slots added to the door through which you pass in different things, and get back different things, but that is the extent of any change. How Active Directory actually processes the stuff you pass in to produce the stuff you get back, what happens behind that locked door, really isn't relevant to you.

If you carry this metaphor forward into the real world, if an application like Exchange uses Active Directory to store its objects, or to perform various operations, none of that functionality should be affected if the Domain or Forest Functional Mode changes. In fact, if your applications are also written to take advantage of new features introduced in Active Directory, you may find that the capabilities of your applications increase when the Level changes.

The answer to the question about the impact of changing the Domain or Forest Functional Level is there should be no impact. If you still have concerns about any third party applications, then you should contact the vendor to find out if they tested the product at the proposed Level, and if so, with what result. The general expectation, however, should be that nothing will change. Besides, you do test your applications against proposed changes to your production AD, do you not? Discuss any issues with the vendor before engaging Microsoft Support.

Where’s the Undo Button?

Even after all this, however, there is a great concern about the change being irreversible, so that you must have a rollback plan just in case something unforeseen and catastrophic occurs to Active Directory. This is another common question, and there is a supported mechanism to restore the Domain or Forest Functional Level. You take a System State back up of one DC in each domain in the forest. To recover, flatten all the DCs in the forest, restore one for each domain from the backup, and then DCPROMO the rest back into their respective domains. This is a Forest Restore, and the steps are outlined in detail in the following guide:

Planning for Active Directory Forest Recovery
http://technet.microsoft.com/en-us/library/planning-active-directory-forest-recovery(WS.10).aspx

By the way, do you know how often we’ve had to help a customer perform a complete forest restore because something catastrophic happened when they raised the Domain or Forest Functional Level? Never.

Best Practices

What can be done prior to making this change to ensure that you have as few issues as possible? Actually, there are some best practices here that you can follow:

1. Verify that all DCs in the domain are, at a minimum, at the OS version to which you will raise the functional level. Yes… I know this sounds obvious, but you’d be surprised. What about that DC that you decommissioned but for which you failed to perform metadata cleanup? Yes, this does happen.
Another good one that is not so obvious is the Lost and Found container in the Configuration container. Is there an NTDS Settings object in there for some downlevel DC? If so, that will block raising the Domain Functional Level, so you’d better clean that up.

2. Verify that Active Directory is replicating properly to all DCs. The Domain and Forest Functional Levels are essentially just attributes in Active Directory. The Domain Functional Level for all domains must be properly replicated before you’ll be able to raise the Forest Functional level. This practice also addresses the question of how long one should wait to raise the Forest Functional Level after you’ve raised the Domain Functional Level for all the domains in the forest. Well…what is your end-to-end replication latency? How long does it take a change to replicate to all the DCs in the forest? Well, there’s your answer.

Best practices are covered in the following article:

322692 How to raise Active Directory domain and forest functional levels
http://support.microsoft.com/default.aspx?scid=kb;EN-US;322692

There, you’ll find some tools you can use to properly inventory your DCs, and validate your end-to-end replication.

Update: Woo, we found an app that breaks! It has a hotfix though (thanks Paolo!). Mkae sure you install this everywhere if you are using .Net 3.5 applications that implement the DomainMode enumeration function.

FIX: "The requested mode is invalid" error message when you run a managed application that uses the .NET Framework 3.5 SP1 or an earlier version to access a Windows Server 2008 R2 domain or forest  
http://support.microsoft.com/kb/2260240

Conclusion

To summarize, the Domain or Forest Functional Levels are flags that tell Active Directory and other Windows components that all DCs in the domain or forest are at a certain minimal level. When that occurs, new features that require a minimum OS on all DCs are enabled and can be leveraged by the Administrator. Older functionality is still supported so any applications or services that used those functions will continue to work as before -- queries will be answered, domain or forest trusts will still be valid, and all should remain right with the world. This projection is supported by over eleven years of customer issues, not one of which involves a case where changing the Domain or Forest Functional Level was directly responsible as the root cause of any issue. In fact, there are only cases of a Domain or Forest Functional Level increase failing because the prerequisites had not been met; overwhelmingly, these cases end with the customer's Active Directory being successfully upgraded.

If you want to read more about Domain or Forest Functional Levels, review the following documentation:

What Are Active Directory Functional Levels?
http://technet.microsoft.com/en-us/library/cc787290(WS.10).aspx

Functional Levels Background Information
http://technet.microsoft.com/en-us/library/cc738038(WS.10).aspx

Jonathan “Con-Function Junction” Stephens


Last Week Before Vista and Win2008 SP1 Support Ends

$
0
0

Last chance folks. Windows Vista and Windows Server 2008 Service Pack 1 support ends on July 12. As in, one week from now. This means computers running SP1 won’t get security updates after the next Patch Tuesday.

image
The bomb represents malware. Or your next review…

For those not running WSUS or SCCM, grab SP2 here:

Not sure which computers are running lean? Check out these inventory techniques you can use to find SP1 computers in the domain using AD PowerShell; all you need is one Win7 computer or Win2008 R2 DC. Only have older DCs? That’s ok, use the AD Management Gateway. Think PowerShell is t3h sux 4 n00b$? That’s ok, Amish IT, use CSVDE.EXE to get a list back from your DCs that you can examine in Excel. Foir example, all the Vista non-SP2 computers:

csvde -f c:\sp.csv -p subtree –d “dc=contoso,dc=com” -r "(&(operatingsystem=windows vista*)(!operatingsystemservicepack=Service Pack 2))" -l operatingsystem,operatingsystemservicepack -u

image
No complaints that this finds stale computers and doesn’t tell you IP addresses – AD PowerShell does all that and bakes delicious pies.

Windows Server 2008 shipped with SP1 built-in as it were, so there is no way for it to have no service pack at all. Whatever you do, it’s closing time for SP1. You don’t have to go home but you can’t stay here.

Ned “I recently upgraded to Windows 2000 – it’s pretty slick!” Pyle

Windows Server 2008 SP1 and Windows Vista SP1 now UNSUPPORTED

Cluster and Stale Computer Accounts

$
0
0

Hi, Mike here again. Today, I want to write about a common administrative task that can lead to disaster: removing stale computer accounts from Active Directory.

Removing stale computer accounts is simply good hygiene-- it’s the brushing and flossing of Active Directory. Like tartar, computer accounts have the tendency to build up until they become a problem (difficult to identify and remove, and can lead to lengthy backup times).

Oops… my bad

Many environments separate administrative roles. The Active Directory administrator is not the Cluster Administrator. Each role holder performs their duties in a somewhat isolated manner-- the Cluster admins do their thing and the AD admins do theirs. The AD admin cares about removing stale computer accounts. The cluster admin does not… until the AD admin accidentally deletes a computer account associated with a functioning Failover Cluster because it looks like a stale account.

Unexpected deletion of Cluster Name Object (CNO) or Virtual computer Object (VCO) is one of the top issues worked by our engineers that support Clustering and High-Availability. Everyone does their job and boom-- Clustered Servers stop working because CNOs or the VCOs are missing. What to do?

What's wrong here

I'll paraphrase an article posted on the Clustering and High-Availability TechNet blog that solves this scenario. Typically, domain admins key on two different attributes to determine if a computer account is stale: pwdlastSet and LastLogonTimeStamp. Domains that are not configured to a Window Server 2003 Domain Functional Level use the pwdLastAttribute. However, domains configured to a Windows Server 2003 Domain Functional Level or later should use the lastLogonTimeStamp attribute. What you may not know is that a Failover Cluster (CNO and VCO) does not update the lastLogonTimeStamp the same way as a real computer.

Cluster updates the lastLogonTimeStamp when it brings a clustered network name resource online. Once online, it caches the authentication token. Therefore, a clustered network named resource working in production for months will never update the lastLogonTimeStamp. This appears as a stale computer account to the AD administrator. Being a good citizen, the AD administrator deletes the stale computer account that has not logged on in months. Oops.

The Solution

There are few things that you can do to avoid this situation.

  • Use the servicePrincipalName attribute in addition to the lastLogonTimeStamp attribute when determining stale computer accounts. If any variation of MSClusterVirtualServer appears in this attribute, then leave the computer account alone and consult with the cluster administrator.
  • Encourage the Cluster administrator to use -CleanupAD to delete the computer accounts they are not using after they destroy a cluster.
  • If you are using Windows Server 2008 R2, then consider implementing the Active Directory Recycle Bin. The concept is identical to the recycle bin for the file system, but for AD objects. The following ASKDS blogs can help you evaluate if AD Recycle Bin is a good option for your environment.

Mike "Four out of Five AD admins recommend ASKDS" Stephens

Friday Mail Sack: They Pull Me Back in Edition

$
0
0

Hiya world, Ned is back with your best questions and comments. I’ve been off to teach this fall’s MCM, done Win8 stuff, and generally been slacking keeping busy; sorry for the delay in posting. That means a hefty backlog - get ready to slurp.

Today we talk:

I know it was you, Fredo.

Question

If I run netdom query dc only writable DCs are returned. If I instead run nltest /dclist:contoso.com, both writable and RODCs are returned. Is it by design that netdom can't find RODC?

Answer

It’s by design, but not by any specific intentions. Netdom was written for NT 4.0 and uses a very old function when you invoke QUERY DC, which means that if a domain controller is not of typeSV_TYPE_DOMAIN_CTRL or SV_TYPE_DOMAIN_BAKCTRL, they are not shown in the list. Effectively, it queries for all the DCs just like Nltest, but it doesn’t know what RODCs are, so it won’t show them to you.

Nltest is old too, but its owners have updated it more consistently. When it returns all the DCs (using what amounts to the same lookup functions), it knows modern information. For instance, when it became a Win2008 tool, its owners updated it to use the DS_DOMAIN_CONTROLLER_INFO_3 structure, which is why it can tell you the FQDN, which servers are RODCs, who the PDCE is, and what sites map to each server.

image

When all this new RODC stuff came about, the developers either forgot about Netdom or more likely, didn’t feel it necessary to update both with redundant capabilities – so they updated Nltest only. Remember that these were formerly out-of-band support tools that were not owned by the Windows team until Vista/2008 – in many cases, the original developers had been gone for more than a decade.

Now that we’ve decided to make PowerShell the first class citizen, I wouldn’t expect any further improvements in these legacy utilities.

Question

We’re trying to use DSRevoke on Win2008 R2 to enumerate access control entries. We are finding it spits out: “Error occurred in finding ACEs.” This seems to have gone belly up in Server 2008. Is this tool in fact deprecated, and if so do you know of a replacement?

Answer

According to the download page, it only works on Win2003 (Win2000 being its original platform, and being dead). It’s not an officially supported tool in any case – just made by some random internal folks. You might say it was deprecated the day it released. :)

I also find that it fails as you said on Win2008 R2, so you are not going crazy. As for why it’s failing on 2008 and 2008 R2, I have not the foggiest idea, and I cannot find any info on who created this tool or if it even still has source code (it is not in the Windows source tree, I checked). I thought at first it might be an artifact of User Account Control, but even on a Win2008 R2 Core server, it is still a spaz.

I don’t know of any purpose-built replacements, although if I want to enumerate access on OUs (or anything), I’d use AD PowerShell and Get-ACL. For example, a human-readable output:

import-module activedirectory

cd ad:

get-acl(get-adobject someDNinquotes) | format-list

image

Or to get all the OUs:

get-acl(get-adorganizationalunit –filter *) | fl

image

Or fancy spreadsheets using select-object and export-csv (note – massaged in Excel, it won’t come out this purty):

image

image

Or whatever. The world is your oyster at that point.

You can also use Dsacls.exe, but it’s not as easy to control the output. And there are the fancy/free Quest AD PowerShell tools, but I can’t speak to them (Get-QADPermission is the cmdlet for this).

Question

We are thinking about removing evil WINS name resolution from our environment. We hear that this has been done successfully in several organizations. Is there anything we need to watch out for in regards to Active Directory infrastructure? Are there any gotchas you've seen with environments in general? Also, it seems that the days of WINS may be numbered. Can you offer any insight into this?

Answer

Nothing “current” in Windows has any reliance on WINS resolution – even the classic components like DFS Namespaces have long ago offered DNS alternatives - but legacy products may still need it. I’m not aware of any list of Microsoft products with all dependencies, but we know Exchange 2003 and 2007 require it, for instance (and 2010 does not). Anything here that requires port 137 Netbios name resolution may fail if it doesn’t also use DNS. Active Directory technologies do not need it; they are all from the DNS era.

A primary limitation of WINS and NetBT is that they do not support IPv6, so anything written for Server 2008 and up wouldn’t have been tested without DNS-only resolution. If you have legacy applications with WINS dependency for specific static records, and they are running at least Server 2008 for DNS, you can replace the single-label resolution functionality provided by WINS with the DNS GlobalNames zone. See http://technet.microsoft.com/en-us/library/cc731744.aspx. Do not disable the TCP/IP NetBIOS Helper service on any computers, even if you get rid of WINS. All heck will break loose.

Rest assured that WINS is still included in the Windows 8 Server Developer Preview, and Microsoft itself still runs many WINS servers; odds are good that you have at least 12 more years of WINS in your future. Yay!

I expect to hear horror stories in the Comments…

Question

What is the expected behavior with respect to any files created in DFSR-replicated folders if they're made prior to initial sync completion? I.e. data in the replicated folder is added or modified on the non-authoritative server during the initial sync?

Answer

  1. If it’s a brand new file created by the user on the downstream, or if the file has already “replicated” from the upstream (meaning that its hash and File ID are now recorded by the downstream server, not that the file actually replicates) and is later changed by the user before initial replication is fully complete, nothing “bad” happens. Once initial sync completes, their original changes and edits will replicate back outbound without issues.
  2. If the user has bad timing and starts modifying existing pre-seeded files that have not yet had their file ID and hashes replicated (which would probably take a really big dataset combined with a really poor network), their files will get conflicted and changes wiped out, in favor of the upstream server.

Question

During initial DFSR replication of a lot of data, I often see debug log messages like:

20111028 17:06:30.308 9092 CRED   105 CreditManager::GetCredits [CREDIT] No update credits available. Suspending Task:00000000010D3850 listSize:1 this:00000000010D3898

 

20111028 17:06:30.308 9092 IINC   281 IInConnectionCreditManager::GetCredits [CREDIT] No connection credits available, queuing request.totalConnectionCreditsGranted:98 totalGlobalCreditsGranted:98 csId:{6A576AEE-561E-8F93-8C99-048D2348D524} csName:GooconnId:{B34747C-4142-478F-96AF-D2121E732B16} sessionTaskPtr:000000000B4D5040

And just what are DFSR “Credits?” Does this amount just control how many files can be replicated to a partner before another request has to be made?  Is it a set amount for a specific amount of time per server?

Answer

Not how many files, per se - how many updates. A credit maps to a "change" - create, modify, delete.  All the Credit Manager code does is allow an upstream server to ration out how many updates each downstream server can request in a batch. Once that pool is used up, the downstream can ask again. It ensures that one server doesn't get to replicate all the time and other servers never replicate - except in Win2003/2008, this still happened. Because we suck. In Win2008 R2, the credit manager now correctly puts you to the back of the queue if you just showed up asking for more credits, and gives other servers a chance. As an update replicates, a credit is "given back" until your list is exhausted. It has nothing to do with time, just work.

"No update credits available" is normal and expected if you are replicating a bung-load of updates. And in initial sync, you are.

Question

The registry changes I made after reading your DFSR tuning article made a world of difference. I do have a question though: is the max number of replicating server only 64?

Answer

Not the overall max, just the max simultaneously. I.e. 64 servers replicating a file at this exact instance in time. We have some customers with more than a thousand replicating servers (thankfully, using pretty static data).

Question

Can members of the Event Log Readers group automatically access all event logs?

Answer

Almost all. To see the security on any particular event log, you can use wevtutil gl . For example:

wevtutil gl security

image

Note the S-1-5-32-573 SID there on the end – that is the Event Log Readers well-known built-in SID. If you wanted to see the security on all your event logs, you could use this in a batch file (wraps):

@echo off

if exist %temp%\eventlistmsft.txt del %temp%\eventlistmsft.txt

if exist %temp%\eventlistmsft2.txt del %temp%\eventlistmsft2.txt

Wevtutil el > %temp%\eventlistmsft.txt

For /f "delims=;" %%i in (%temp%\eventlistmsft.txt) do wevtutil gl "%%i" >> %temp%\eventlistmsft2.txt

notepad %temp%\eventlistmsft2.txt

My own quick look showed that a few do not ACL with that group – Internet Explorer, Microsoft-Windows-CAPI2, Microsoft-Windows-Crypto-RNG, Group Policy, Microsoft-Windows-Firewall with advanced security. IE seems like an accident, but the others were likely just considered sensitive by their developers.

Other stuff

Happy Birthday to Bill Gates and to Windows XP. You’re equally responsible for nearly every reader or writer of this blog having a job. And in my case, one not digging ditches. So thanks, you crazy kids.

The ten best Jeremy Clarkson Top Gear lines… in the world!

Halloween Part 1: Awesome jack-o-lantern templates, courtesy of ThinkGeek. Yes, they have NOTLD!

Halloween Part 2: Dogs in costume, courtesy of Bing. The AskDS favorite, of course, is:

image

 

Thanks to Japan, you can now send your boss the most awesome emoticon ever, when you fix an issue but couldn’t get root cause:

¯\_(ツ)_/¯

Pluto returning to planet status? It better be; that do-over was lame…

Finally – my new favorite place to get Sci-Fi and Fantasy pics is Cgsociety. Check out some of 3D and 2D samples from the Showcase Gallery:

 

clip_image002clip_image004
clip_image006clip_image008
clip_image010clip_image012
clip_image014
That last one makes a great lock screen

Have a great weekend, folks.

- Ned “They hit him with five shots and he's still alive!” Pyle

Getting a CMD prompt as SYSTEM in Windows Vista and Windows Server 2008

$
0
0
Ned here again. In the course of using Windows, it is occasionally useful to be someone besides… you. Maybe you need to be an Administrator temporarily in order to fix a problem. Or maybe you need to be a different user as only they seem to have...(read more)

Forced Demotion of a Windows Server 2008 Core Domain Controller

$
0
0
Ned here again. Today's post is short and sweet, but when you need this one you will need it fast and we don't have this publically documented anywhere on TechNet (yet). Since Windows 2000 SP4, it has been possible to forcibly demote Domain Controllers...(read more)

Directory Services and more, from Madrid

$
0
0
Ned here again. I recently spent a week with Microsoft Support Engineers from all over the world, and bumped into a colleague that works in MS Spain, out of Madrid. She mentioned that they had a Spanish-language blog focused on Directory Services, networking...(read more)

DFSR SYSVOL Migration FAQ: Useful trivia that may save your follicles

$
0
0
Hi, Ned here again. Today I'm going to go through some well-hidden information on DFSR SYSVOL migration; hopefully this sticks in your brain and someday allows you to enjoy your weekend rather than spending it fighting issues. As you already know...(read more)

Addendum: Making the DelegConfig website work on IIS 7

$
0
0
Hi All Rob here again. I thought I would take the time today and expand upon the Kerberos Delegation website blog to show how you can use the web site on IIS 7. Actually, Ned beat me up pretty badly for not showing how to set the site up on IIS 7 [ I...(read more)

Headache Prevention: Install Hotfix 953317 to Prevent DNS Records from Disappearing from Secondary DNS Zones on Windows Server 2008 SP1

$
0
0
Craig here. We’ve had some nasty cases related to this bug, so it seemed prudent to do our best to increase the awareness of this issue. In a nutshell, the DNS Server service in Windows Server 2008 has a bug that can result in a large number of DNS records...(read more)

Changes in Functionality from 2008 to 2008 R2 (mostly)

$
0
0
Ned here again. We're all snowed in down in Charlotte today, but that doesn't stop the blogging. We've published a new TechNet guide to many of the changes between Windows Server 2008 and Windows Server 2008 R2; it's definitely worth a look and has good...(read more)

DS Restore Mode Password Maintenance

$
0
0
Ned here again. There comes a day in nearly every administrator’s life where they will need to boot a domain controller into DS Restore Mode. Whether it’s to perform an authoritative restore or fix database issues , you will need the local...(read more)

Netmon, MPS, RODC's, and that new OS you might have heard about

$
0
0

Ned here. A few big pieces of news, in case you've been having a busy week:

  • Netmon 3.3 has been released. You can download from here. Read more about the new features (such as autoscroll, frame commenting, experts, WWAN support, and more) right here.
  • MPS Reports. They're back. They work on Vista and 2008, as well as XP and 2003. You don't need a support case to use them. Grab here. Hallelujah.
  • RODC's in DMZ's. Whitepaper on deploying AD Read-Only Domain Controllers into perimeter networks. Jump to it.

And the mack daddy...

  • Windows 7 and Windows Server 2008 R2 Release Candidate. On schedule for an April 30th MSDN/TechNet release, and a May 5th public release. Wait, you don't believe me, your faithful Beta engineer? Pfffft, believe it.

- Ned 'Talking Head' Pyle

ADMT 3.1 and Windows Server 2008 R2

$
0
0

Hello All,

UPDATE June 19 2010 - stop reading and go here:

http://blogs.technet.com/b/askds/archive/2010/06/19/admt-3-2-released.aspx

=====

There’s a known issue with installing Active Directory Migration Tool (ADMT) v3.1 onto a Windows Server 2008 R2 computers that I want to bring to everyone’s attention. At this time it has been acknowledged that version 3.1 (which does require Windows Server 2008) returns the following error when attempting to install it onto R2:

"ADMT must be installed on Windows Server 2008"

This issue also occurs with Windows 2008 machines that previously had ADMT installed, and then upgraded to Windows 2008 R2. ADMT will no longer function correctly and returns the same error as detailed above. Microsoft is aware of the issue and diligently working on a resolution. Please stay tuned for further details and updates.

I’d also like to take this opportunity to ask that you send me any future feature suggestions and requests for the tool, as I’ve been asked to present results of the “voice of the customer”. The ADMT development group would like to hear from our customers on how we could make the product better. Please feel free to post comments or email your recommendations and suggestions in what you’d like to see in a later release of ADMT.

Happy migrating!

-Jason Fournerat


ADMT, RODC’s, and Error 800704f1

$
0
0

Hello all, Jason here again. With this blog post, I just wanted to bring an ADMT issue to the masses’ attention, as I’ve experienced it multiple times within just the last couple of months.

There’s an issue when attempting to migrate computer account objects into a Windows 2008 domain that had been prepared for a Read-Only Domain Controller with the ‘ADPrep /RODCPrep’ command.  To confirm if the command had been implemented, look for the following attribute within the ADSIEdit snap-in on the targeted 2008 domain:

CN=ActiveDirectoryRodcUpdate,CN=ForestUpdates,CN=Configuration,DC=<DomainName>,DC=com

Note: If ran, the value for the ‘Revision’ attribute will be set to ‘2’.

This is what is specifically witnessed within the ADMT log file:

ERR3:7075 Failed to change domain affiliation, hr=800704f1

The system detected a possible attempt to compromise security. Please ensure that you can contact the server that authenticated you.

When this error is generated, it is due to the following hotfix NOT being installed onto the client machine that you are migrating into the Windows 2008 domain:

944043 Description of the Windows Server 2008 read-only domain controller compatibility pack for Windows Server 2003 clients and for Windows XP clients and for Windows Vista
http://support.microsoft.com/default.aspx?scid=kb;EN-US;944043

Upon installing the hotfix and rebooting the client machine(s), re-running ADMT for the computer object migration will now succeed.

- Jason “J4” Fournerat

Group Policy Slow Link Detection using Windows Vista and later

$
0
0

Mike here again. Many Group Policy features rely on a well connected network for their success. However, not every connection is perfect or ideal; some connections are slow. The Group Policy infrastructure has always provided functionality to detect slow links. However, the means by which Group Policy determines this are different between operating systems prior to Windows Server 2008 and Windows Vista.

Before Windows Server 2008 and Vista

Windows Server 2003, Windows XP, and Windows 2000 Group Policy uses the ICMP protocol to determine a slow link between the Group Policy client and the domain controller. This process is documented in Microsoft Knowledgebase article 227260: How a slow link is detected for processing user profiles and Group Policy (http://support.microsoft.com/default.aspx?scid=kb;EN-US;227260).

The Group Policy infrastructure performs a series of paired ICMP pings from the Group Policy client to the domain controller. The first ping contains a zero byte payload while the second ping contains a payload size of 2048 bytes. The results from both pings are computed and voila, we have the bandwidth estimation. However, using ICMP has some limitations.

Many "not-so-nice" applications use ICMP maliciously. This new found use increased ICMP’s popularity forced IT professional to take precautions. These precautions included blocking ICMP. The solution to block ICMP provided relief from the susceptibility of malicious ICMP packets, but broke Group Policy. Workarounds were created (Microsoft Knowledgebase article 816045 Group Policies may not apply because of network ICMP policies); But the update did not remove the ICMP dependency.

The Windows Server 2008 and Vista era

Windows 7 and Windows Vista to the rescue! These new operating systems implement a new slow link detection mechanism that DOES NOT use ICMP-- but we already knew this. The question we will answer is how does the new Group Policy slow link detection work?

The easy answer to how the new slow link detection works is Network Location Awareness (NLA). This networking layer service and programming interface allows applications, like Group Policy, to solicit networking information from the network adapters in a computer, rather than implementing their own methods and algorithms. NLA accomplishes this by monitoring the existing traffic of a specific network interface. This provided two important benefits: 1) it does not require any additional network traffic to accomplish its bandwidth estimate-- no network overhead, and 2) it does not use ICMP.

Group Policy using NLA

The question commonly asked is how does Group Policy slow link detection implement NLA. The actual algorithms used by NLA are not as important as what Group Policy does during its request to NLA for bandwidth estimation.

Locate a domain controller

A Group Policy client requires communication with a domain controller to successfully apply Group Policy. The Group Policy service must discover a domain controller. The service accomplishes this by using the DCLocator service. Windows clients typically have already discovered a domain controller prior to Group Policy application. DCLocator caches this information makes it available to other applications and services. The Group Policy service makes three attempts to contact a domain controller, with the first attempt using the domain controller information stored in the cache. The latter two attempts force DCLocator to rediscover domain controller information. Retrieving cached domain controller information does not traverse the network, but forceful rediscovery does. Domain controller information includes the IP address of the domain controller. The Group Policy service uses the IP address of the domain controller (received from DCLocator) to begin bandwidth estimation.

During bandwidth estimation

The Group Policy service begins bandwidth estimation after it successfully locates a domain controller. Domain controller location includes the IP address of the domain controller. The Group Policy service performs the following actions during bandwidth estimation.

NOTE: All actions listed in this section generate network traffic from the client to the domain controller unless otherwise noted. I've included a few actions that do not generate network traffic because their results could be accomplished using methods that generate network traffic. These actions are added for clarity.

Authentication

The first action performed during bandwidth estimation is an authenticated LDAP connect and bind to the domain controller returned during the DCLocator process. This connection to the domain controller is done under the user's security context and uses Kerberos for authentication. This connection does not support using NTLM. Therefore, this authentication sequence must succeed using Kerberos for Group Policy to continue to process. Once successful, the Group Policy service closes the LDAP connection.

NOTE: The user's security context is relative to the type of Group Policy processing. The security context for computer Group Policy processing is the computer. The security context for the user is the current user for the current session.

The Group Policy service makes an authenticated LDAP connection as the computer when user policy processing is configured in loopback-replace mode.

Determine network name

The Group Policy services then determines the network name. The service accomplishes this by using IPHelper APIs to determine the best network interface in which to communicate with the IP address of the domain controller. The action also uses Winsock APIs; however, this action does not create any network traffic. Additionally, the domain controller and network name are saved in the client computer's registry for future use.

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Group Policy\History is where the service stores these values. The value names are DCName and NetworkName.

NOTE: The NetworkName registry value is used by the Windows firewall to determine if it should load the domain firewall profile.

Site query

Group Policy processing must know the site to which the computer belongs. To accomplish this, the Group Policy service uses the Netlogon service. Client site discovery is an RPC call from the client computer to a domain controller. The client netlogon service internally caches the computer's site name. The time-to-live (TTL) for the site name cache is five minutes. However, TTL expiry is on demand. This means the client only checks the TTL during client discovery. This check is implemented by Netlogon (not the Group Policy service). If the cached name is older than five minutes from when the name was last retrieved from the domain controller, then the Netlogon service makes an RPC call to the domain controller to discover the computer site. This explains why you may not see the RPC call during Group Policy processing. However, the opportunity for network traffic exists.

Determine scope of management

The following Group Policy actions vary based on Group Policy processing mode. Computer Group Policy processing only uses normal Group Policy processing. However, user Group Policy processing can use normal, loopback-merge, and loopback-replace modes.

Normal mode

Normal Group Policy processing is the most common Group Policy processing actions. Conceptually these work the same regardless of user or computer. The most significant difference is the distinguished name used by the Group Policy service.

Building the OU and domain list

The Group Policy service uses the distinguished name of the computer or user to determine the list of OUs and the domain it must search for group policy objects. The Group Policy service builds this list by analyzing the distinguished name from left to right. The service scans the name looking for each instance of OU= in the name. The service then copies the distinguished name to a list, which it uses later. The Group Policy service continues to scan the distinguished name until for OUs until it encounters the first instance of DC=. At this point, the Group Policy service has found the domain name, which completes the list. This action does not generate any network traffic.

Example: Here is the list from a given distinguished name

Distinguished Name:
            cn=user,OU=marketing,OU=hq,DC=na,DC=contoso,DC=com
List:
            OU=marketing,OU=hq,DC=na,DC=contoso,DC=com
            OU=hq,DC=na,DC=contoso,DC=com
            DC=na,DC=contoso,DC=com

Evaluate scope of management

The Group Policy service uses the list OUs to determine the Group Policy objects linked to each scope of management and the options associated with each link. The service determines linked Group Policy objects by using a single LDAP query to the domain controller discovered earlier.

LDAP requests have four main components: base, scope, filter, and attributes. The base is used to specify the location within the directory the search should begin, which is usually represented as a distinguished name. The scope determines how far the search should traverse into the directory; starting from the base. The options include base, one-level, and subtree. The base scope option limits the search to only return objects matching the filter that matches the base. The onelevel option return objects from one level below the base, but not including the base. The subtree option returns objects from the base and all levels below the base. The filter provides a way to control what objects the search should return (see MSDN for more information on LDAP search filter syntax). The attribute setting is a list of attributes the search should return for the objects discovered that match the filter.

The service builds the LDAP request with the following arguments:

BaseDN:  domain
Scope: Sub Tree
Filter: (|(distinguishedname=OU=xxx)( more OUs)(ends domainNC DC=))
Attributes: gpLink, gpOptions, ntSecurityDescriptor

Example:  Scope of management LDAP search
       BaseDN: DC=na,DC=contoso,DC=com
       Scope: SubTree
       Filter: (|(distinguishedname= OU=marketing,OU=hq,DC=na,DC=contoso,DC=com)
               (distinguishedname =OU=hq,DC=na,DC=contoso,DC=com)
               (distinguishedname =DC=na,DC=contoso,DC=com))
    Attributes:gPlink,gPoptions,nTSecurityDescriptor

Determining the scope of normal Group Policy processing mode occurs in the security context of the applying security principal. The computer performs the LDAP query computer processing and the user performs the LDAP query for user processing. Merge and Replace are user-only processing modes, which occur under the security context of the user.

Replace user-processing performs an LDAP query using the computer’s distinguished name. Each component of the distinguished name is inserted into the filter portion of the LDAP query. The LDAP query filter parameter ends with the distinguished name of the domain (which is assembled using the parts of the computer’s distinguished name.

Merge user-processing performs two LDAP queries. The first LDAP query uses the distinguished name of the user object. The second query uses the distinguished name of the computer object. The Group Policy links returned from both queries are merged into one list. The Group Policy service merges these lists together by adding the Group Policy links returned from the computer query to the end of the list of Group Policy links returned from the user query. Concatenating the computer list to the end of the user list results with the Group Policy links listed in the order they apply.

Determine the Link Status:

The Group Policy service is ready to determine the status of the link between the client computer and the domain controller. The service asks NLA to report the estimated bandwidth it measured while earlier Group Policy actions occurred. The Group Policy service compares the value returned by NLA to the GroupPolicyMinTransferRate named value stored in HKEY_LOCAL_MACHINE\Software\Microsoft\WindowsNT\CurrentVersion\Winlogon, which is the preference key or, HKEY_LOCAL_MACHINE\Software\Policies\Microsoft\Windows\System, which is the policy key. The default minimum transfer rate to measure Group Policy slow link is 500 (Kbps). The link between the domain controller and the client is slow if the estimated bandwidth returned by NLA is lower than the value stored in the registry. The policy value has precedence over the preference value if both values appear in the registry. After successfully determining the link state (fast or slow—no errors), then the Group Policy service writes the slow link status into the Group Policy history, which is stored in the registry. The named value is IsSlowLink and is located at HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Group Policy\History. This value is an REG_DWORD value that is interpreted as a Boolean value; with a non-zero value equaling false and a zero value equaling true. If the Group Policy service encounters an error, it read the last recorded value from the history key and uses that true or false value for the slow link status.

Conclusion

Group Policy slow link detection has matured since the days of using ICMP for slow link detection. Today, Windows 7 and Windows Vista’s Group Policy services use NLA to sample TCP communication between the client and the domain controller, without sending additional network traffic.

- Mike “Huuuh, whaaaa?” Stephens

Auditing Password and Account Lockout Policy on Windows Server 2008 and R2

$
0
0

Ned here again. Let’s talk about auditing your domain for changes made to Password and Account Lockout policies. Frankly, it’s a real pain in the neck to figure out Password and Account Lockout auditing and there are legacy architectural decisions behind how this all works, so I’ll make sure to cover all the bases. This also includes auditing your Fine Grain Password policies (FGPP), for you bleeding-edge types.

Understanding how these policies work

We use Password and Account Lockout policies to control domain authentication. Password policies set requirements for things like password length, complexity, and maximum age. Account Lockout policies control lockout threshold and duration, and are very popular with The Devil.

There are two types of Password and Account Lockout policies in a domain:

  • Domain-wide– Introduced in Windows NT and set in Active Directory through domain security policy.
  • Fine Grained– Introduced in Windows Server 2008 and set in AD through manual means like ADSIEDIT or AD PowerShell. It configures settings on a user or group-membership basis, and there can be as many as you like.

Domain-based policy, while being set through security policy, is actually written to attributes on the root of the domain. ADSIEdit shows this object using the distinguished name of the domain name. This odd location results from providing NT 4.0 compatibility. Since NT 4.0 could not apply group policy, we had to store these values somewhere and answer requests about the settings in an NT fashion.

image

On the other hand, Fine Grained policies write to their own location. Windows stores each policy as a leaf object.

image

When you edit your built-in Default Domain password policy, you are actually editing:

\\contoso.com\sysvol\contoso.com\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}\MACHINE\Microsoft\Windows NT\SecEdit\GptTmpl.inf

All your settings are in this format:

[System Access]
MinimumPasswordAge = 0
MaximumPasswordAge = 60
MinimumPasswordLength = 8
PasswordComplexity = 1
PasswordHistorySize = 4
LockoutBadCount = 50
ResetLockoutCount = 30
LockoutDuration = 30
RequireLogonToChangePassword = 0
ForceLogoffWhenHourExpire = 0
ClearTextPassword = 0
LSAAnonymousNameLookup = 0

When DC applies this security policy during the five minute group policy refresh, the DC stamps these settings on the domainDNS object. And voila, you have your policies in place. But think about that – the DC stamps these settings in place when applying computer policy. Who do you think will be listed as the user in your audit event logs? That’s right – the DC itself. And that’s where this blog post comes in. :-)

Auditing Domain-Wide Policy

There are three main things you need to do to see domain-wide password and account lockout setting changes, but they differ slightly by OS:

1. Put an auditing entry on the “Policies” container. Enabling auditing for EVERYONE on the “CN=Policies,CN=System,DC=<your domain>” container causes auditing to track all writes, deletes, and permission modifications. The audit event shows the user modifying group policy in general. Obviously, this is useful for more than just password policy changes – “Hey, who set this policy to push a Domo-Kun wallpaper out to all the computers?”

image

2. Enable subcategory auditing for:

    a. “Authentication Policy Change” (if using Windows Server 2008 R2 DC’s).

    b. “Other Account Management Events” (if using Windows Server 2008 DC’s).

3. Enable subcategory auditing for “Directory Service Changes”.

    Note: In Windows Server 2008 R2, granular subcategory auditing is available through GPMC.

image

In Windows Server 2008, you need to use the script provided in KB921469.

After enabling auditing, Windows then generates security audit events for anyone editing domain-wide security policy for passwords and account lockouts:

1.    An event 5136 will be written that shows the versionNumber attribute of the policy being raised:

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 3:04:17 PM
Event ID:      5136
Task Category: Directory Service Changes
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
A directory service object was modified.
Subject:
    Security ID:        CONTOSO\Administrator
    Account Name:        Administrator
    Account Domain:        CONTOSO

    Logon ID:        0x1e936

Directory Service:
    Name:    contoso.com
    Type:    Active Directory Domain Services
Object:
    DN:    CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=POLICIES,CN=SYSTEM,DC=CONTOSO,DC=COM
    GUID:    CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=Policies,CN=System,DC=contoso,DC=com
    Class:    groupPolicyContainer
Attribute:
    LDAP Display Name:    versionNumber
    Syntax (OID):    2.5.5.9
    Value:    121

 

 

Note: The event ID shows the name of the user that modified the policy – every policy edit raises the version number. Now we know to go look at the policy and that someone changed it.

2. Windows writes a follow-up event (event id 4739) for each type of change – lockout policy or password policy. For example:

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 3:01:28 PM
Event ID:      4739
Task Category: Authentication Policy Change
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
Domain Policy was changed.

Change Type:        Lockout Policy modified

Subject:
    Security ID:        SYSTEM
    Account Name:        2008R2-F-01$
    Account Domain:        CONTOSO
    Logon ID:        0x3e7

Domain:
    Domain Name:        CONTOSO
    Domain ID:        CONTOSO\

Changed Attributes:
    Min. Password Age:    -
    Max. Password Age:    -
    Force Logoff:        -
    Lockout Threshold:    500
    Lockout Observation Window:   
    Lockout Duration:   
    Password Properties:   
    Min. Password Length:   
    Password History Length:   
    Machine Account Quota:   
    Mixed Domain Mode:   
    Domain Behavior Version:   
    OEM Information:    -

====

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 3:04:23 PM
Event ID:      4739
Task Category: Authentication Policy Change
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
Domain Policy was changed.

Change Type:        Password Policy modified

Subject:
    Security ID:        SYSTEM
    Account Name:        2008R2-F-01$
    Account Domain:        CONTOSO
    Logon ID:        0x3e7

Domain:
    Domain Name:        CONTOSO
    Domain ID:        CONTOSO\

Changed Attributes:
    Min. Password Age:    -
    Max. Password Age:    -
    Force Logoff:        -
    Lockout Threshold:    -
    Lockout Observation Window:    -
    Lockout Duration:    -
    Password Properties:    -
    Min. Password Length:    5
    Password History Length:    -
    Machine Account Quota:    -
    Mixed Domain Mode:    -
    Domain Behavior Version:    -
    OEM Information:    -

Notice the account name is the DC itself. This event, while useful, needs to be correlated with the 5136 event to see what changed. And even then, these events can sometimes be difficult to understand – what is a “password property” after all? (it’s for complexity being turned on or off). You should probably use these events as a notification to go examine the actual policies in GPMC.

You’re probably asking yourself why I didn’t just audit the actual domain root object and skip using the “Authentication Policy Change” and “Other Account Management Events”. This is another of the vagaries of security policy auditing – it doesn’t work. Simply auditing the “DC=domain,DC=com” object does not return any information about password or lockout changes. Go figure.

Auditing Fine-Grained Policy

Auditing FGPP is simpler and the data is easier to read. FGPP does not contain intermediate security policy settings. Creating and modifying these policies directly edits the objects in Active Directory. You can create or modify FGPP using PowerShell, LDP, LDIFDE, or ADSIEDIT. This means there’s no layer between doing work on your behalf. Also, your audit events are clean and self-evident.

1. Put an auditing entry on the “Password Settings Container” container. Enabling auditing for EVERYONE on the “CN=Password Settings Container,CN=System,DC=<your domain>” object causes Windows to track all users who write, delete, and modify permissions on any FGPPs.

image

2. Enable subcategory auditing for “Directory Service Changes” (see previous section for steps).

After enabling auditing, Windows generates a security audit event for anyone editing FGPPs for each change made. Also, the audit event includes the new value and the value prior to the change:

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 4:20:54 PM
Event ID:      5136
Task Category: Directory Service Changes
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
A directory service object was modified.
Subject:
    Security ID:        CONTOSO\RobGreene
    Account Name:        RobGreene
    Account Domain:        CONTOSO

    Logon ID:        0x1e936

Directory Service:
    Name:    contoso.com
    Type:    Active Directory Domain Services
Object:
    DN:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    GUID:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    Class:    msDS-PasswordSettings
Attribute:
    LDAP Display Name:    msDS-PasswordComplexityEnabled
    Syntax (OID):    2.5.5.8
    Value:    TRUE
Operation:
    Type:    Value Deleted
    Correlation ID:    {6afa8930-85cd-44d9-828b-9cc3c1b5a8b9}
    Application Correlation ID:    -

===

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          10/24/2009 4:20:54 PM
Event ID:      5136
Task Category: Directory Service Changes
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      2008r2-f-01.contoso.com
Description:
A directory service object was modified.
Subject:
    Security ID:        CONTOSO\RobGreene
    Account Name:        RobGreene
    Account Domain:        CONTOSO

    Logon ID:        0x1e936

Directory Service:
    Name:    contoso.com
    Type:    Active Directory Domain Services
Object:
    DN:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    GUID:    CN=VIP DomainUsersPSO,CN=Password Settings Container,CN=System,DC=contoso,DC=com
    Class:    msDS-PasswordSettings
Attribute:
    LDAP Display Name:    msDS-PasswordComplexityEnabled
    Syntax (OID):    2.5.5.8
    Value:    FALSE
Operation:
    Type:    Value Added
    Correlation ID:    {6afa8930-85cd-44d9-828b-9cc3c1b5a8b9}
    Application Correlation ID:    -

Here I can see the user RobGreene logged on and changed the password complexity requirements from TRUE to FALSE. I knew it! Rob Greene, always breaking my stuff…

See Edie, I told you I’d write a blog post on this. :-)

- Ned “the chiropractor” Pyle

Clustered Certification Authority maintenance tasks

$
0
0

Hi all Rob Greene here again. I thought I would share with you how to do some common tasks with a Windows Server 2008 clustered Certification Authority (CA). When the CA is clustered there are definitely different steps that need to be taken when you:

  • Make a change to the behavior of the CA by using certutil.exe with –setreg or –delreg switches.
  • Modify the registry values in the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\CertSvc hive.
  • Renew the CA’s certificate.

In the past before the Certification Authority service (CertSvc) was supported in a cluster, you could make these changes and then stop and start the CertSvc service without a problem. This is still the case when the Certification Authority has not been clustered.

However, when you have the Certification Authority configured as a cluster you must avoid starting and stopping the service outside of the Cluster Administrator snap-in (Cluadmin.msc). The reason is that the Cluster Service not only keeps track of the service state for CertSvc, it is also responsible for making sure that the registry key location HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\CertSvc is saved to the quorum disk when it notices a change to the registry location. This is noted in the CA Cluster whitepaper also, which is required reading for anyone clustering CA’s.

Changing the behavior of the Certification Authority

If you need to make a change to the behavior of the Certification Authority with CertUtil.exe or direct registry modification, you must always follow the steps below:

1. Logon to the active clustered Certification Authority node. If you are unsure which node currently owns the resource do the following:

    a. Launch the failover Cluster Management MMC.

    b. Select the Certification Authority resource, and in the right hand pane you will see the “Current Owner” (See figure 1).

image
Figure 1 - Current Owner of the Certification Authority resource

2. Use certutil.exe –setreg (recommended) command, or modify the registry directly.

3. Launch the Failover Cluster Management snap-in.

4. Take the Certification Authority resource (Service) offline and then bring it back online. We have to take the resource offline and back online since the CertSvc service will not read any registry key changes without being restarted, and as I stated above when the CA is clustered you should refrain from stopping and starting the CertSvc service directly.

    a. Right click on the Certification Authority resource in the tree view pane.

    b. Select either “Bring this service or application online” or “Take this service or application offline” (See figure 2).

image
Figure 2 - Taking the resource offline / online

Renewing the subordinate Certification Authority certificate

This section discusses the steps that need to be done when renewing a subordinate CA certificate. A Root certification authority shouldn’t be clustered and instead should be configured as an offline root.

Verify the request file location

When the CA certificate is renewed it will stop the CertSvc service to generate the certificate request file. The request file location and name is dictated by the following registry key:

HKEY_Local_Machine\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>\RequestFileName

Before renewing the CA you will want to make sure that the registry key points to a valid file system path. If it is not, either the renewal will fail silently, or you might get the error “The system cannot find the file specified” when you attempt to renew the CA. If you have to change this value do the following on the active CA node:

1. certutil –setreg CA\RequestFileName “<File path and name>”. For example:

  certutil –setreg CA\RequestFileName "c:\contoso_subCA1.req”

2. Take the resource offline and back online (See Figure 2 above).

Renewing the Certification Authority certificate

As noted earlier, if the CertSvc service is stopped or started outside of the Failover Cluster Management snap-in the cluster system is not aware of any changes that are done to the registry. Here is a high level process of what happens when a CA is renewed so that you can understand why the below steps are necessary on a clustered CA:

1. CertSvc service is stopped to generate the certificate request file. It reads the RequestFileName registry value to determine where and what the file name should be for the request file.

2. CertSvc service is started once the request file has been generated.

3. CertSvc service is stopped again to install the issued certificate from the CA.

4. The CACertHash registry value is updated to include the new CA certificate hash.

NOTE
: NEVER DELETE OR MODIFY this registry value unless directed by Microsoft support. Modifying this registry key can cause the CA not to function properly or in some cases to not even start!

Here are the actual steps to renew the CA on a cluster.

1. Open the Failover Cluster Management snap-in.

2. “Pause” the inactive Certification Authority node. If you are unsure about which server is the active node see Figure 1.

    a. Select the computer node in the Failover Cluster Management snap-in.

    b. Right click on it select “Pause”.

image
Figure 3 - Pausing the inactive node

3. Once the inactive node is paused you can renew the CA’s certificate. Please review the following TechNet article to help with the process of actually getting the subordinate CA certificate renewed.

4. Once you have gotten the CA’s certificate renewed by the root CA, and installed the new certificate to the subordinate CA you will need to take the Certification Authority resource offline and then back online within the Failover Cluster Management snapin.

    a. Right click on the Certification Authority resource in the tree view pane.

    b. Select either “Bring this service or application online” or “Take this service or application offline” (See figure 2 above).

5. Open the Certification Authority snapin, and target the Clustered Network resource name.

6. Right click on the Certification Authority name and select properties.

7. If you renewed with a new key pair you should see several certificates listed as show in figure 4.

image
Figure 4 - Certification Authority properties.

8. Once you have verified that the Certification Authority is using the renewed CA certificate you can “Resume” the node that was paused in step 2.

Since the Certification Authority service is configured as a generic service the above processes must be adhered to when managing a clustered CA. If changes are made outside of the Cluster service’s knowledge then the nodes will never be in sync and clustering will fail

- Rob “Raaaahhb” Greene

Tuning replication performance in DFSR (especially on Win2008 R2)

$
0
0

Hi all, Ned here again. There are a number of ways that DFSR can be tuned for better performance. This article will go through these configurations and explain the caveats. Even if you cannot deploy Windows Server 2008 R2 - for the absolute best performance - you can at least remove common bottlenecks from your older environments. If you are really serious about performance in higher node count DFSR environments though, Win2008 R2’s 3rd generation DFSR is the answer.

If you’ve been following DFSR for the past few years, you already know about some improvements that were made to performance and scalability starting in Windows Server 2008:

Windows Server 2003 R2

Windows Server 2008

Multiple RPC calls

RPC Async Pipes (when replicating with other servers running Windows Server 2008)

Synchronous inputs/outputs (I/Os)

Asynchronous I/Os

Buffered I/Os

Unbuffered I/Os

Normal Priority I/Os

Low Priority I/Os (this reduces the load on the system as a result of replication)

4 concurrent file downloads

16 concurrent file downloads

But there’s more you can do, especially in 2008 R2.

Registry tuning

All registry values are REG_DWORD (and in the explanations below, are always in decimal). All registry tuning for DFSR in Win2008 and Win2008 R2 is made here:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DFSR\Parameters\Settings

A restart of the DFSR service is required for the settings to take effect, but a reboot is not required. The list below is not complete, but instead covers the important values for performance. Do not assume that setting a value to the max will make it faster; some settings have a practical limitation before other bottlenecks make higher values irrelevant.

Important Note: None of these registry settings apply to Windows Server 2003 R2.

AsyncIoMaxBufferSizeBytes
Default value: 2097152
Possible values: 1048576, 2097152, 4194304, 8388608
Tested high performance value: 8388608
Set on: All DFSR nodes

RpcFileBufferSize
Default value: 262144
Possible values: 262144, 524288
Tested high performance value: 524288
Set on: All DFSR nodes

StagingThreadCount
Default value: 6
(Win2008 R2 only; cannot be changed on Win2008)
Possible values: 4-16
Tested high performance value: 8
Set on: All DFSR nodes. Setting to 16 may generate too much disk IO to be useful.

TotalCreditsMaxCount
Default value: 1024
Possible values: 256-4096
Tested high performance value: 4096
Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow)

UpdateWorkerThreadCount
Default value: 16
Possible values (Win2008): 4-32
Possible values (Win2008 R2): 4-63*
Tested high performance value: 32

Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow. The number being raised here is only valuable when replicating in from more servers than the value. I.e. if replicating in 32 servers, set to 32. If replicating in 45 servers set to 45.

*Important note: The actual top limit is 64. We have found that under certain circumstances though, setting to 64 can cause a deadlock that prevents DFSR replication altogether. If you exceed the maximum tested value of 32, set to 63 or lower. Do not set to 64 ever. The 32 max limit is recommended because we tested it carefully, and higher values were not rigorously tested. If you set this value to 64, periodically replication will stop working, the dfsrdiag replstate command hangs and does not return results, and the dfsrdiag backlog command hangs and does not return results.

 

When using all the above registry tuning on Windows Server 2008 R2, testing revealed that initial sync replication time was sometimes twice as fast compared to no registry settings in place. This was using 32 servers replicating a "data collection" topology to a single hub over thirty-two non-LAN networks with 32 RG's containing unique branch office data. The slower the network, the better the relative performance averaged:

Test

Spokes

Hubs

Topology

GB/node

Unique

RG

Tuned

Network

Time to sync

C1

32

1

Collect

1

Yes

32

N

1Gbps

0:57:27

C2

32

1

Collect

1

Yes

32

Y

1Gbps

0:53:09

C3

32

1

Collect

1

Yes

32

N

1.5Mbps

3:31:36

C4

32

1

Collect

1

Yes

32

Y

1.5Mbps

2:24:09

C5

32

1

Collect

1

Yes

32

N

512Kbps

10:56:42

C6

32

1

Collect

1

Yes

32

Y

512Kbps

5:57:09

C7

32

1

Collect

1

Yes

32

N

256Kbps

21:43:02

C8

32

1

Collect

1

Yes

32

Y

256Kbps

10:46:46

On Windows Server 2008 the same registry values showed considerably less performance improvement; this is partly due to additional service improvements made to DFSR in Win2008 R2, especially around the Credit Manager. Just like your phone, “3G” DFSR is going to work better than older models…

Note: do not use this table to predict replication times. It is designed to show behavior trends only!

Topology tuning

Even if you are not using Windows Server 2008 R2 there are plenty of other factors to fast replication. Some of these I’ve talked about before, some are new. All are important:

  • Minimize mixing of Win2003 and Win2008/Win2008 R2 - Windows Server 2008 introduced significant DFSR changes for RPC, inbound and outbound threading, and other aspects. However, if a Win2008 server is partnered with a Win2003 server for DFSR, most of those improvements are disabled for backwards compatibility. An ideal environment is 100% Windows Server 2008 R2, but a Win2008-only is still a huge improvement. Windows Server 2003 should be phased out of use as quickly as possible as it has numerous "1G" design issues that were improved on with experience in later OS's. Windows Server 2008 R2 credit manager and update worker improvements are most efficient when all operating systems are homogenous. If you are replacing Win2003 servers with newer OS, do the hub servers first as the increased number of files will provide some benefits even when talking to 2003 spokes.
  • Consider multiple hubs - If using a large number of branch servers in a hub-and-spoke topology, adding “subsidiary hub” servers will help reduce load on the main hubs.

    So for example, this configuration would cause more bottlenecking:

image

And this configuration would cause less bottlenecking:

image

  • Increase staging quota - The larger the replicated folder staging quotas are for each server, the less often files must be restaged when replicating inbound changes. In a perfect world, staging quota would be configured to match the size of the data being replicated. Since this is typically impossible, it should be made as large as reasonably possible. It must always be configured to be at least as large as the combined size of the count of the files controlled by UpdateWorkerThreadCount+16 on Win2008 and Win2008 R2. Why 16? Because that is the number of outbound files that could be replicated simultaneously.

This means that by default on Win2008/Win2008 R2, quota must be as large as the 32 largest files. If UpdateWorkerThreadCount is increased to 32, it must be as large as the 48 largest files (32+16). If any smaller then staging can become blocked when all 32 files are being replicated inbound and 16 outbound, preventing further replication until that queue is cleared. Frequent 4202 and 4204 staging events are indications of an inappropriately configured staging quota, especially if no longer in the initial sync phase of setting up DFSR for the first time.

Source : DFSR
Catagory : None
Event ID : 4202
Type : Warning
Description :
The DFS Replication service has detected that the staging space in use for the replicated folder at local path c:\foo is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected.   

Source : DFSR
Catagory : None
Event ID : 4204
Type : Information
Description :
The DFS Replication service has successfully deleted old staging files for the replicated folder at local path c:\foo. The staging space is now below the high watermark.

If you get 4206 staging events you have really not correctly sized your staging, as you are now blocking replication behind large files.

Event Type: Warning
Event Source: DFSR
Event Category: None
Event ID: 4206
Date: 4/4/2009
Time: 3:57:21 PM
User: N/A
Computer: SRV
Description:
The DFS Replication service failed to clean up old staging files for the replicated folder at local path c:\foo. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in 1 minutes. The service may start cleanup earlier if it detects some staging files have been unlocked.

If still using Win2003 R2, staging quota would need to be as large as the 9 largest files. And if using read-only replication on Windows Server 2008 R2, at least 16 or the size specified in UpdateWorkerThreadCount – after all, a read-only replicated folder has no outbound replication.

So to recap the staging quota minimum recommendations:

- Windows Server 2003 R2: 9 largest files
- Windows Server 2008: 32 largest files (default registry)
- Windows Server 2008 R2: 32 largest files (default registry)
- Windows Server 2008 R2 Read-Only: 16 largest files

If you want to find the 32 largest files in a replicated folder, here’s a sample PowerShell command:

Get-ChildItem <replicatedfolderpath> -recurse | Sort-Object length -descending | select-object -first 32 | ft name,length -wrap –auto

  • Consider read-only - Deploy Windows Server 2008 R2 read-only replication when possible. If users are not supposed to change data, mark those replicated folders as read-only. A read-only server cannot originate data and will prevent unwanted replication or change orders from occurring outbound to other servers. Unwanted changes generate load and lead to data overwrites – which to fix you will need to replicate back out from backups, consuming time and replication resources.
  • Latest QFE and SP - Always run the latest service pack for that OS, and the latest DFSR.EXE/DFSRS.EXE for that OS. There are also updates for NTFS and other components that DFSR relies on. Hotfixes have been released that remove performance bugs or make DFSR more reliable; a more reliable DFSR is naturally faster too. These are documented in KB968429 and KB958802 but the articles aren’t always perfectly up to date, so here’s a trick: If you want to find the latest DFSR service updates, use these three searches and look for the highest KB number in the results:

Win2008 R2: http://www.bing.com/search?q=%22windows+server+2008+r2%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&go=&form=QBRE

Win2008: http://www.bing.com/search?q=%22windows+server+2008%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n

Win2003 R2: http://www.bing.com/search?q=%22windows+server+2003+r2%22+%22dfsr.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n

Remember, Win2003 mainstream support ends July 13, 2010. That’s the end of non-security updates for that OS.

People ask me all the time why I take such a hard line on DFSR hotfixes. I ask in return “Why don’t you take such a hard line?” These fixes cost us a fortune, we’re not writing them for our health. And that goes for all other components too, not just DFSR. It’s an issue intrinsic to all software. DFSR is not less reliable than many other Windows components – after all, NTFS is considered an extremely reliable file system but that hasn’t stopped it from having 168 hotfixes in its lifetime; DFSR just has a passionate group of Support Engineers and developers here at MS that want you to have the best experience.

  • Turn off RDC on fast connections with mostly smaller files - later testing (not addressed in the chart below) showed 3-4 times faster replication when using LAN-speed networks (i.e. 1GBb or faster) on Win2008 R2. This is because it was faster to send files in their totality than send deltas, when the files were smaller and more dynamic and the network was incredibly fast. The performance improvements were roughly twice as fast on Win2008 non-R2. This should absolutely not be done on WAN networks under 100 Mbit though as it will likely have a very negative affect.
  • Consider and test anti-virus exclusions – Most anti-virus software has no concept of the data types that make up DFSR’s working files and database. Additionally, those file types are not executables and are therefore very unlikely to contain a useful malicious payload. If you are seeing slow performance within DFSR, test the following anti-virus file exclusions; if DFSR performs considerably better, contact your AV vendor for an updated version of their software and an explanation around the performance gap.

<drive>:\system volume information\DFSR\

   $db_normal$
   FileIDTable_* 
   SimilarityTable_*

<drive>:\system volume information\DFSR\database_<guid>\

   $db_dirty$
   Dfsr.db
   Fsr.chk
   *.log
   Fsr*.jrs
   Tmp.edb

<drive>:\system volume information\DFSR\config\

   *.xml

<drive>:\<replicated folder>\dfsrprivate\staging\*

   *.frx

This should be validated carefully; many anti-virus products allow exclusions to be set but then do not actually abide by them. For maximum performance, you would exclude scanning any replicated files at all, but this is obviously unfeasible for most customers.

  • Pre-seed the data when setting up a new replicated folder- Pre-seeding - often referred to as "pre-staging" - data on servers can lead to huge performance gains during initial sync. This is especially useful when creating new branch office servers; if they are being built in the home office, they can be quickly pre-seeded with data then sent out to the field for replication of the change delta. See the following article for pre-seeding recommendations.

Going back to those same tests I showed earlier with 32 spokes replicating back to a single hub, note the average performance behavior when the data was perfectly pre-seeded:

Test

Spokes

Hubs

Topology

GB/node

Unique

RG

Tuned

Staging

Net

Time to sync

C9

32

1

Collect

1

Yes

32

Y

4GB

1Gbps

0:49:21

C11

32

1

Collect

1

Yes

32

Y

4GB

512Kbps

0:46:34

C12

32

1

Collect

1

Yes

32

Y

4GB

256Kbps

0:46:08

C13

32

1

Collect

1

Yes

32

Y

4GB

64Kbps

0:48:29

Even the 64Kbps frame relay connection was nearly as fast as the LAN! This is because no files had to be sent, only file hashes.

Note: do not use this table to predict replication times. It is designed to show behavior trends only.

  • Go native Windows Server 2008 R2 – Not to beat a dead horse but the highest performance gains - including registry tuning and the greatly improved Credit Manager code - will be realized by using Windows Server 2008 R2. Win2003 R2 was first generation DFSR, Win2008 was second generation, and Win2008 R2 is third generation; if you are serious about performance you must get to 2008 R2.

Hardware tuning

  • Use 64-bit OS with as much RAM as possibleon hubs - DFSR can become bound by RAM availability on busy hub servers, especially when using the registry performance values above. There is absolutely no reason to run a 32-bit file server in this day and age, and with the coming of Windows Server 2008 R2, it’s no longer possible. For spoke servers that tend to have far less load, you can cut more corners of course; the ten-user sales team in Hicksville doesn’t need 16GB of RAM in their file server.

As a side note, customers periodically open cases to report “memory leaks” in DFSR. What we discuss is that DFSR intentionally caches as much RAM as it can get its hands on – really though, it’s the ESE (Jet) database doing this. So the idler other processes on a DFSR server are, the more memory a DFSR process will be able to gobble up. You can see the same behavior in LSASS’s database on DC’s.

  • Use the fastest disk subsystem you can affordon hubs - Much of DFSR will be disk bound - especially in staging and RDC operations - so high disk throughput will dramatically lower bottlenecks; this is especially true on hub servers. As always, a disk queue length greater than 2 in PerfMon is in indication of an over-used or under-powered disk subsystem. Talk to your hardware vendors about the performance and cost differences of SATA, SCSI, and FC. Don’t forget about reliability too – I have a job here for life thanks to all the customers that use the least expensive, off-brand, no warranty, low parity, practically consumer-grade iSCSI products they can find. You get what you pay for and ultimately your users do not care about anything but their data. The OS is just a thing to make applications access files so that the business can make money. Someday the Linux desktop folks will figure this out and get some applications; then we may actually be in trouble here.

If using iSCSI, make sure you have redundant network paths to the disks, using multiple switches and NIC’s. We have had quite a few cases lately of no fault tolerance iSCSI configs that would go down for hours in the middle of DFSR updating the database and transaction logs, and the results were obviously not pretty.

  • Use reliable networks - They don't necessarily have to be fast, but they do need to stay up. Many DFSR performance issues are caused by using old network card drivers, using malfunctioning "Scalable Network" (TCP offload, RSS, etc.) settings, or using defective WANs. Network card vendors release frequent driver updates to increase performance and resolve problems; just like Windows service packs, the drivers should be installed to improve reliability and performance. Companies often deploy cost saving WAN solutions (with VPN tunnels, frame relay circuits, etc.) that in the end cost the company more in lost productivity than they ever saved in monthly expense. DFSR - like all RPC applications - is sensitive to constant network instability.
  • Review our performance tuning guides– For much more detail on squeezing performance out of your hardware, including network, storage, and the rest, review:

And that’s it.

- Ned “fork” Pyle

Viewing all 66 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>