Don’t you just hate it when you’ve been doing things the right way for so long that you actually forget or don’t realize why it is “the right way”. The Exchange UM Certificate is one such example. I’ve been deploying OCS, Lync, Skype for 10+ years now, and Exchange before that since the 5.0 days (frack I’m old)… 99% of the time when integrating Skype with Exchange Unified Messaging, the UM role has either been left out or simply not configured. I’m not going into the configuration of UM, there are a tonne of articles to that effect, but one item of extreme importance that is not always clear; The Subject Name for the Certificate used for the UM role(s) MUST be the FQDN of the server itself.
Most of the time because the certs used on Exchange servers have been deployed for a while, and a lot of the time Public Certs are already deployed on Exchange servers, BUT Public Certs now must use valid Top Level Domain Names e.g. can’t use .local .lan .domain. So you create a internal CA cert with the FQDN of the server and assign it, no problems. But, every now and then you come across a customer with internal and external domains matching, and they have added the server FQDN to the SAN list. Internal calls to the the Subscriber Access line work, great, moving on… Opps, did someone forgot to test externally, through the Edge, wait what, media failure, dang.
Ok, story time over, time to look in the weeds at the technical mumbo jumbo.
Symptom 1 – Call connects, but no bing bong “To access your mailbox…” when a User is working remotely and connected to their Edge server and trying to connect to Voicemail.
Symptom 2 – Not every time, but Event ID 32263 LS User Services error shows up in the Lync Event Log on the Frontends
Error code: 0xC3E93C2F(SIPPROXY_E_UNKNOWN_USER_OR_EPID), Error identifier: NotifySingle.Notify
Cause: Possible issues with the other front end or with this front end.
Symptom 3 – Call Detail Reports, Diagnostic ID 21 – “Call failed to establish due to a media connectivity failure where one endpoint is of unknown type”
Now for the “why”. I’m such a GenX, I need the “why”. Extensive logging, and reviewing of logs, and yes, support with Microsoft, I kept noticing something odd in the SIP Trace. After the 180 Ringing and 200 OK, the ACK switched to sip:Autodiscover.company.com, then delay, and then BYE. MS went down a few other roads regarding networking, PowerShell commands, ports, port settings, and about 6 Gb more worth of logging, but this Autodiscover thing kept bugging me.
Finally we looked at the cert on the UM servers (Exchange 2010, two dedicated UM role servers), to check and make sure all the roots, intermediates and what not (FYI, this Cert checklist link, is handy have). First thing that popped up to me, the Subject name of the Cert assigned to the UM role is yes, Autodiscover.company.com. The SAN list did contain the FQDN of both Servers.
Of course this seems silly, no where have I seen it documented that the Subject Name MUST be the FQDN of the UM server, but lets give that a try. Two new internal CA certs were generated, SN and SAN being the server FQDN, assigned to the UM role, and services restarted.
Calls work. Mothersmurfer…
I must reiterate, we made absolutely no other changes other than to replace the cert.
I did notice that in the autodiscover sip trace that there is a ms-fe=prdexs1um1.company.com field, but this was insufficient for connecting calls through the Edge, even with name resolution for autodiscover hard codes in the HOSTS file to point to a UM server. Somewhere in the code, the SIP call appears to switch to the Subject Name of the Cert for both resolution and MTLS, and it must be this way for Edge users to connect to Voice mail, and Auto Attendants too I suspect, but did not have one to test or work with.
An additional link of interest, a link to narrowing the audio port range of your Exchange UM servers and set for QoS.