Exchange UM Cert and Skype for Business

Don’t you just hate it when you’ve been doing things the right way for so long that you actually forget or don’t realize why it is “the right way”.  The Exchange UM Certificate is one such example.  I’ve been deploying OCS, Lync, Skype for 10+ years now, and Exchange before that since the 5.0 days (frack I’m old)…  99% of the time when integrating Skype with Exchange Unified Messaging, the UM role has either been left out or simply not configured.  I’m not going into the configuration of UM, there are a tonne of articles to that effect, but one item of extreme importance that is not always clear; The Subject Name for the Certificate used for the UM role(s) MUST be the FQDN of the server itself.

Most of the time because the certs used on Exchange servers have been deployed for a while, and a lot of the time Public Certs are already deployed on Exchange servers, BUT Public Certs now must use valid Top Level Domain Names e.g. can’t use .local .lan .domain.  So you create a internal CA cert with the FQDN of the server and assign it, no problems.  But, every now and then you come across a customer with internal and external domains matching, and they have added the server FQDN to the SAN list.  Internal calls to the the Subscriber Access line work, great, moving on…  Opps, did someone forgot to test externally, through the Edge, wait what, media failure, dang.

Ok, story time over, time to look in the weeds at the technical mumbo jumbo.

Symptom 1 – Call connects, but no bing bong “To access your mailbox…” when a User is working remotely and connected to their Edge server and trying to connect to Voicemail.

Symptom 2  – Not every time, but Event ID 32263 LS User Services error shows up in the Lync Event Log on the Frontends
Error code: 0xC3E93C2F(SIPPROXY_E_UNKNOWN_USER_OR_EPID), Error identifier: NotifySingle.Notify
Cause: Possible issues with the other front end or with this front end.

Symptom 3 – Call Detail Reports, Diagnostic ID 21 – “Call failed to establish due to a media connectivity failure where one endpoint is of unknown type”

Now for the “why”.  I’m such a GenX, I need the “why”.  Extensive logging, and reviewing of logs, and yes, support with Microsoft, I kept noticing something odd in the SIP Trace.  After the 180 Ringing and 200 OK, the ACK switched to, then delay, and then BYE.  MS went down a few other roads regarding networking, PowerShell commands, ports, port settings, and about 6 Gb more worth of logging, but this Autodiscover thing kept bugging me.

Finally we looked at the cert on the UM servers (Exchange 2010, two dedicated UM role servers), to check and make sure all the roots, intermediates and what not (FYI, this Cert checklist link, is handy have).  First thing that popped up to me, the Subject name of the Cert assigned to the UM role is yes,  The SAN list did contain the FQDN of both Servers.

Of course this seems silly, no where have I seen it documented that the Subject Name MUST be the FQDN of the UM server, but lets give that a try.  Two new internal CA certs were generated, SN and SAN being the server FQDN, assigned to the UM role, and services restarted.

Calls work.  Mothersmurfer…

I must reiterate, we made absolutely no other changes other than to replace the cert.

I did notice that in the autodiscover sip trace that there is a field, but this was insufficient for connecting calls through the Edge, even with name resolution for autodiscover hard codes in the HOSTS file to point to a UM server.  Somewhere in the code, the SIP call appears to switch to the Subject Name of the Cert for both resolution and MTLS, and it must be this way for Edge users to connect to Voice mail, and Auto Attendants too I suspect, but did not have one to test or work with.

An additional link of interest, a link to narrowing the audio port range of your Exchange UM servers and set for QoS.

Group Series 6.1.5 update

The Group Series 6.1.5 firmware updates were released today bringing some useful functionality to the systems, mainly VBSS.

  • Video-based Screen Sharing for Skype for Business
  • Managing Skype for Business Calls
  • System Upgrade or Downgrade Through Skype for Business Server
  • Org ID Authentication
  • Conference Recording with RealPresence Touch
  • Dialing through ISDN Gateway
  • RealPresence Collaboration Server (RMX) Call Escalation
  • Displaying Call Participant Names
  • New and Modified API Commands
Release notes and downloads can be found at Polycom.
(The Polycom Trio’s also came out with VBSS support with their December update, but came out with a new firmware with additional field fixes and can be found here.)
Some items that are resolved with 6.1.5 may be of interest:
  • In a Skype for Business environment, when a PSTN endpoint places a call to another PSTN endpoint registered to RealPresence Group Series system, the caller receives a loud audio.
  • When RealPresence Group Series system joins an AVMCU  conference, the Remote Desktop Protocol (RDP) content drops
  • When RealPresence Group Series system dials in to a Skype for Business client for Mac in a point-to-point call, Skype for Business client crashes
  • When RealPresence Group Series system registered to Skype for Business server performs a Skype for Business call with other RealPresence Group Series system and SfB clients, far-end
    video drops and system restarts
  • In an AVMCU call with Skype for Business client, the Group Series system does not receive video content occasionally

Happy updating.

VBSS and QoS

A glitch in the matrix that’s been bugging me for a while but just haven’t sat down to think it through… What to do with Video Based Screen Sharing and QoS.  VBSS came to us midstream for Skype for Business in June 2016, as a nice improvement to update speed and smoothness for Application/Desktop sharing, utilizing video and UDP instead of RDP and TCP.  Of course if you have a older client or if you start Recording the session, it reverts back to RDP/TCP.

What many have not realized though is that when a client is using VBSS in a Skype Meeting (not talking peer to peer here), that the Skype client selects an Application Port for the Source, but the Skype Server actually selects a Video port.  For the purpose of this post I’m using my typical QoS settings:
Audio Server: 49152-57500 Audio Client: 50040-50079
Video Server: 57501-65535 Video Client: 58000-58039
App Server: 40803-49151 App Client: 42000-42039
DSCP: Audio=46, Video=34, Application Sharing=24

The screenshot below is a Wireshark capture on the FE server.  As you can see, the Client source port is 42004, and the Destination port on the Server is 62345.  And in this particular network where this sample is taken, there are source/destination port restrictions set for the DSCP tags.  In this case 42000-40039 <–> 40803-49151.  Result is that not only is the traffic using both App and Video ports, it’s not even left tagged at all.

QoS Problem #1 – assuming the network does not have Src/Dst restrictions, you would end up with a DSCP mismatch; traffic from the server tagged with 34, traffic from the client tagged with 24.  Server Video traffic is then getting mixed with Server VBSS traffic.

QoS Problem #2 – with network Src/Dst port restrictions in place, traffic is deprecated into the lowest bucket 0, or worse.

In my opinion, the VBSS ports, both server and client, should have remained in the Application port ranges, but obviously this is not the case.  Surprisingly the solution is not to difficult.  We need to modify the QoS Policy on the Frontend servers by adding one new QoS policy.

New-NetQosPolicy -Name “Skype Server VBSS” -IPProtocol Both -IPSrcPortStart 57501 -IPSrcPortEnd 65535 -IPDstPortStart 42000 -IPDstPortEnd 42039 -DSCPValue 24

This new policy filters for network traffic matching both the Source port and the Destination port to tag it with a 24.  From my observations, because the “Skype Server VBSS” has this additional criteria, this makes it more restrictive than the “Skype Server Video” policy and it gets applied and the tagging works as desired.

Look at those happy little tags.  .101 is the Server in this screen shot and the .119 is a Trio.

If it is a particular concern, or just really want to locked down the policy, you can use this “Skype Server Video” policy.  My concern about adding this policy restriction is the traffic with the Edge for external users would not be prioritized on the network.  Edge traffic to the front end is going to be 443 or 3478 and would meet either Video QoS policy criteria:
New-NetQosPolicy -Name “Skype Server Video” -IPProtocol Both -IPSrcPortStart 57501 -IPSrcPortEnd 65535 -IPDstPortStart 58000 -IPDstPortEnd 58039 -DSCPValue 34

To address QoS Problem #2, the network will need to be reconfigured to allow:
42000-40039 <–> 40803-49151 with DSCP=24
42000-40039 <–> 57501-65535 with DSCP=24

Of course another solution is to modify your App and Video ports for client and server to all live in the 57501-65535 space with both using DSCP=34, but then you have no delineation between VBSS and Camera Video streaming.  A less then ideal solution.

Or disable VBSS… yuck.


For posterity sake, the other Net QoS Policy settings I use are:

Frontend Only:
New-NetQosPolicy -Name “Skype Server Audio” -IPProtocol Both -IPSrcPortStart 49152 -IPSrcPortEnd 57500 -DSCPValue 46

New-NetQosPolicy -Name “Skype Server Video” -IPProtocol Both -IPSrcPortStart 57501 -IPSrcPortEnd 65535 -DSCPValue 34

New-NetQosPolicy -Name “Skype Server App” -IPProtocol Both -IPSrcPortStart 40803 -IPSrcPortEnd 49151 -DSCPValue 24



Polycom Trio 5.5.2 firmware

Normally I’m not posting about specific firmware, but in the (5.5.2 Rev AC) release for Polycom Trio’s (yup, plural, there are two models now 8800 and 8500) the codec SILK has been made available.  Skype for Business SILK has only been added as an experimental feature so to make use of it, you need to modify the Codec Priorities on the device.  Rumors abound that we may see SILK in VVX500 and VVX600 series devices in the near future, but that lower end devices may not have the processing power needed.

SILK will only be capable of being used in peer-to-peer type calls as it’s not available in the Conferencing MCU, and a reminder that it is “Experimental” at this time.  There appears to be some typo’s currently if you want to modify a FTP Provisioning server cfg file for your Trio’s, the above exported config reports that SILK is in ksps instead of kbps.  Perhaps this is the change over to Kilo Samples Per Second, or a just a typo…  Either way, something to be mindful of when modifying config files.


This release for the RealPresence Trio 8800 system includes the following highlights:  (pdf release notes here)

• Screen Mirroring on RealPresence Trio Solution

• Software Update using Windows Server

• RealPresence Trio 8800 System Media Keepalive

• Toggle Content and People Video Streams

• Skype for Business User Experience Enhancements

• Viewing a Different Calendar in Skype for Business Mode

• Dynamic Port Ranges for Video and Content

• Adding a PSTN Participant to a Call

• Displaying Multiple Calendar Meetings on Connected Monitor

• Web Sign in for Skype for Business Online

• Secure Single Sign-On (SSO) with Third-Party Supporting Solutions

• Managing Skype for Business Conference Participant Level in the Call Roster Screen

• Device Lock

• Client Media Port Ranges for Quality of Experience (QoE)

• Microsoft Quality of Experience Monitoring Server Protocol (MS-QoE)

• Exchange Web Services Discovery

• Unified Contact Store

• Alert Tones for Mute Status

• Dial Plan Normalization

• Dial Plan for SIP URI Dialing

• Join a Meeting using SIP URI

• Hybrid Line Registration

• User Log Upload

• Audio, Video, and Content Port Ranges

• Media Transport Ports for audio, video, and content

• Experimental: Support for SILK Audio Codec

Firmware for the Trio’s can be found here:  Polycom Voice Software

iOS 11.0.1 and Skype Mobility Client

*Fast Publish*

My version of a fast publish, mainly because I need more people to verify the issue, condition and if my suggested fix works for you as well, so please give feedback.

We were hearing client reports that they were unable to sign in on an iOS 11.0.1 updated device, or if/when they are able to sign in, they were getting kicked out after 10 mins.

After running an UCWA trace on the Frontends, the traffic for the user I was testing with literally just ended.  My intuition led me to think of the reverse proxy.

Quick scan though the settings and the timeout value was set to 960 seconds, 16 mins.  That doesn’t quite match the 10 min punting window, but I’ve seen weirder, and I like all-skype-timeout things to be 20 min or more.

We tested first with 3600 second and 1800 second timeout values and both were successful, the Skype Mobility client was not longer getting kicked out.  A 1200 setting will probably work, and may retry again tomorrow, but my preference for all Skype related firewall timeouts is 30 mins.

I checked our corporate environment, where we weren’t having issues, and the timeout was already set to 1800.  Not quite a pattern, but a possibly solution I hope.

Regular Skype Desktop clients are supposed to renew their connections every 5-15 minutes, and that the firewall timeouts need to be 20 minute or higher.  The Mac and iOS clients aren’t regular clients as they are using UCWA, https based connections.  I’ve seen with Netscalar if you leave the default timeouts at 180 instead of using 1200 or 1800, that the desktop client signs in, and is immediately signed back out.  Almost like there is a TTL minus 300 mechanism of some kind, that will end the clients connection if it hasn’t already renewed.  Just a guess though.  UCWA logging for the client activity ended very abruptly and dirty.

I welcome feedback on:
– if you have a web services time out setting of less 1200 and not having issues
– if you are having issues with less than a 1200 setting and raising it to 1200 or higher resolves
– if you are still having issues with a 1200 or a 1800 setting
– I’m also curious if you have a 600 second Time-out setting, does your iOS client kick out after 5 mins instead of 10 mins.



Skype for Business Debug Tools 7.0.1678.1

The current build of the Skype for Business Debug Tools, 7.0.1678.1 June 9 – 2017, has a lovely new requirement for yet another version of the Microsoft Visual C++.  This time it’s Visual C++ 2015 x64 version 14.0.23026 or higher.

Said version can be found here and you will want the x64 build.   After that, no problem with installation or usage, so far…

Of course now I have Visual C++ 2008 x4 versions, 2010 x2, 2012, 2013 and now 2015…

UPDATE: Feb 8, 2018

The Skype for Business Debugging Tools had a new release on Feb 2, 2018.  Still need the particular Visual C++ version above in order to successfully install.

I say “particular” because it seems that if you’ve installed Skype on a Windows 2016 server, you can’t install the above Visual C++ version, cause a newer version is already installed, and these debug tools seem to insist on a particular version.  Sorry I don’t have a resolution other than to keep using the 6.0.9319.73 build.


Health Agent Probes

If you were hoping for an in depth article on Skype for Business Health Agent Probes, keep on looking, you won’t find it here.  I did wrestle with one such probe failure, Event 56001 LS Health Agent.

One or more Health Agent Probes encountered an unexpected error. The component(s)/Service(s) intended to be monitored by the Probe may be functioning correctly.

Probes:  System.ServiceModel.Security.SecurityNegotiationException: Could not establish trust relationship for the SSL/TLS secure channel with authority ‘’. blah blah blah.

SSL/TLS secure channel, naturally start checking the certs; nope, they’re all good.

Next, off to check IIS, maybe the certs are mixed up or something…  Opened up the Bindings for the Skype for Business External website to view the assigned cert, the port are wrong.  For some reason the External web site was assigned 80 and 443 instead of 8080 and 4443.  Internal was set to 8080 and 4443 instead of 80 and 443.  Very odd.  Joys of troubleshooting other peoples installations, and I have no idea if this was manually done or a glitch in the system somewhere along the way.

After juggling the ports around, didn’t even have restart, poof, Agent Health Probes all happy again.  Hopefully this shouldn’t be something that people encounter very often, but at least there are some clues to look at for drilling down as to why these “Health Probes” might be failing.

Update:  Published to fast.  Turned out someone messed with the Internal and External ports in the Topology.  Unfortunately just setting them back to the correct configuration isn’t enough, probably buried deep in the settings are more references to the Ports that do not update automatically when you republish.  Final resolution was to uninstall the Web Components and re-run the Deployment wizard, re-run the Server Component installation and the Certificate wizard.

Safe to say though, the Health Agent Probes are programed with the default Web site ports, changing them will likely result in probe failures.

VVX 5.5.x logon issue

I can’t take credit for this, but I document for my own future benefit as well as for the benefit of others.  A new client received a number of VVX phones which during the projects we standardized to 5.4.5 code but without a provisioning server, but that’s not important.  There were no issues with authenticating the phone with the Skype for business user account (not PIN, user account).  BUT when we went to test 5.5.1 firmware, the phones would not authenticate.  Even phones which were already authenticated prior to the update, were no longer able to sign in.

Setting the “dhcp.option43.override.stsUri” attribute of the phone, or manually setting “DHCP Option 43 Override STS-URI” setting under Settings | Provisioning Server, DHCP Menu, to the same value in DHCP Option 43, e.g.  allowed the phones to perform user authentications again.

Near as I can figure, this “might” be a result of using a VLAN for the VVX and perhaps 5.5.1 code can’t properly retrieve the option 43 from DHCP, which it doesn’t have this issue 5.4.x code.  This is merely speculation.  Very odd.


Local SQL Services Startup issue

I’ve come across this issue a number of times over the years, rebooting a Lync or Skype server, specifically Std deployment type, and one of the three SQL Express Instances just doesn’t start up in time before the RTCSRV Frontend Service starts to start.  “Sometimes” it goes away with patching the local SQL instance up to SP2, but not often enough.  Could be a resourcing problem, but still, for a lab even, 4 processors and 8 Gb should be enough to start services up.

Enough is enough.  Why on earth what I’m about to show you isn’t done by default, is beyond me.  Why is the RTCSRV service on a Standard Edition Deployment, or for that matter on an Enterprise Deployment, though I’ve not encountered this on Enterprise Frontend Servers.

Using the SC command from an CMD Prompt (does not work from PowerShell) use the following command to make sure no other dependencies have already been made:  SC QC RTCSRV

Second last line you can see the current Dependencies, i.e. KeyIso in this case.

When modifying the dependency list, you have to include the existing, plus what ever you are wanting to add, separated by a forward slash (/).



Re-run:  “SC QC RTCSRV”  to confirm your new dependencies have been successfully added.

Notice the change in the list of Dependencies.  You can also view this in the Services:

And if you want to get really nerdy, go into the Registry:

I’ve found that with this change, the RTCSRV service starts up nice and clean, and instead of taking about 25 minutes before crapping out, all the services have started up within 5 minutes of initiating a normal Reboot of the server.

I haven’t had need to try this on an Enterprise Server but the command should be similar, just minus the MSSQL$RTC.

sc config RtcSrv depend= KeyIso/MSSQL$LYNCLOCAL/MSSQL$RTCLOCAL

I still install at least SP2 for SQL 2014.  There were some performance issues in RTM that were resolved around CU7 or CU8, so SP1 should be your minimum anyway, that should help speed things up.  SQL Download Table

Magic CsDatabase Command

Hopefully the use of this will be rare for most, but its come up a couple times of late and magically solved a number of glitches, hiccups and downright headaches.  I’ll save the suspense and give it to you first, then the scenario’s.

Install-CsDatabase -Update -LocalDatabases

A) I needed to add a new SIP trunk to the Topology of a Lync 2013 environment, done it a thousand times with out issue.  Unfortunately this time the databases weren’t happy or in a properly updated state as the RTCSRV service crashed after running the Deployment Wizard, which also crashed, leaving the system in an in operable state.  With similar or the same error messages below, in scenario B.  In this case the CPSDyn and RTCDyn were irreparably damaged and un-mountable.  Fortunately from past experience with Lync 2010 database updates, I knew Dyn databases can be toasted without loss of life.  CPSDyn came back no problem with the Install-CsDatabase -ConfiguredDatabases -SqlServerFqdn, but the RTCDyn database did not.  Lync 2013 had “deprecated” the use/need for the -update switch, so I didn’t even think about it and must have tried 15 other methods or variations of getting this Dyn database back.  @RealTimeUC and I both seemed to have the same divine inspiration, as I was typing, he also suggested “Try with the -Update switch”.  Low and behold, it worked.  RTCDyn database was brought back to life, and the Frontend RTCSRV service can finally start again.

B) A recent Standard Deployment of Skype for Business resulted in the starting of the RTCSRV, Call Park Server and RGS services.

LS Call Park Service Event ID 31008
Connection: Data Source=(local)\rtc;Initial Catalog=cpsdyn;Integrated Security=True
Message: Cannot open database “cpsdyn” requested by the login. The login failed.
Login failed for user ‘NT AUTHORITY\NETWORK SERVICE’.

LS Response Group Service Event ID 31201
Exception: System.Data.SqlClient.SqlException – Cannot open database “rgsconfig” requested by the login. The login failed.

Login failed for user ‘NT AUTHORITY\NETWORK SERVICE’.

LS User Services Event ID 32134
Back-end Database: rtcxds Connection string of:
driver={SQL Server Native Client 11.0};Trusted_Connection=yes;AutoTranslate=no;server=(local)\rtc;database=rtcxds;
Cause: Possible issues with back-end database.

I checked, and the 3 SQL Services were running (LYNCLOCAL, RTC and RTCLocal).  I also have a neat trick for these databases I’ll show in my next blog.

A coworker had seen this before and suggested using the SQL Mgmt tools and compare the permissions of the databases to a working one.

Forget that, I have a magic PS command.  I figured if it can bring back a missing database, why not correct the database permissions.  I ran the same command successfully, and the system came to life.


I don’t believe in short cuts through life, but work smarter, not necessarily harder.  PS is your friend.