I’ve been meaning to write this post for a number of years, and it’s probably been 2 or 3 years since I last had an incident of this nature, but time to share the knowledge, and an often overlooked area of concern, for both Exchange and Skype/Lync is the state of the DNS environment. DNS is one of those foundation services that if it’s not working correctly, can give you all kinds of grief.
Many super guru’s out there when deploying an On Premises Skype/Lync environment like to do everything with PowerShell, including the Schema changes, Forest and Domain Preps. There is something comforting about a nice GUI, with progress bars and check boxes. I have the PS commands as well though, just in case things aren’t working quite right.
In one such environment, the Deployment wizard just could not get the Schema to go. However, after the 4th or 5th attempt, it worked. Rather then trying to figure out the Why, I went on with preparing the Forest, this time using: Enable-CsAdForest -GroupDomain company.local -Verbose Boom, failed again. Quick scour through the xml and I find this: Cannot find one suitable domain controller under the domain company.local. Ok, I check, there are plenty of Domain Controllers available, hmmm, 6 GC’s for that site alone… for 200 people… I retried the same command again and again, 5th time, it works. Ok, that ain’t right.
Quick chat with the customer revealed that the reason for so many GC’s is that the log on process for users was soooo slow often times, 5-8 mins, but other times fine. But the additional DC’s didn’t really seem to help. Well, that’s not good.
AD Health check later, (this kind of command is helpfull, dcdiag /e /c /v /f:c:\temp\dcdiag1.txt), checking replication status, Sites and Services of course, nothing really stood out. Other than LSSAS being unusually busy for a 200 user environment with 6 GC’s, time to start with the basics… How does a client pick a DC for log on… Like many things, DNS is key.
I start poking around DNS, making sure resolution is working for all the servers listed in AD Sites and Services. Yup, all good. Then I start looking in the _msdcs folders for the domain… Drilling down through DC, _Sites, Default-First-Site-Name, _tcp. Ok we’ve got entries… a WHOLE lot of entries. Checking the rest; Domains, GC, PDC, and with the exception of PDC, there were too many GC’s/DC’s listed. In fact there were 13 Global Catalog servers listed, of which only 6 still existed. I sense resolution ahead.
Turned out no one was decommissioning Domain Controllers properly in the environment. Basically they would shut down a retiring server, leave it off for a few days, and then delete it from Sites and Services and from the Domain. Problem was, the DNS entries were never cleaned up. AD Decom steps Because DCPromo was not used none of the GC/DC related DNS entries were removed.
After an hour or two late of gleaning through every level of the DNS structure, verifying where the server existed, and/or existed as a DC (IP recycling), LSASS utilisation on the GC’s dropped from +20% down to 1%. I reran the Schema prep without issue, same for Forest and Domain preps, no more problem.
In Summary, whether logging on to the domain, or running forest or domain preps, you PC/Server gets back a list of GC from DNS. Sometimes the first entry is a working one, never in my case, but I ran into the same issue with another client, recognized the problem right away, and was able to specify an operating GC, and was able to advise them to cleanup ASAP their DNS entries.
Enable-CsAdForest -GlobalCatalog ad01.company.local -GroupDomain company.local -GroupDomainController ad01.company.local -GlobalSettingsDomainController ad01.company.local -Verbose
Enable-CsAdDomain -Domain company.local -GlobalCatalog ad01.company.local -Verbose
If you absolutely have to resort to the above commands to get the environment prepped, take it as a sign that something isn’t right, and you’ll want to have a look at the AD DNS.