Response Groups and Disconnected calls

I’ve hit this a couple times now and burns my brain each time cause I know the answer.  Maybe blogging about it will keep it in the forefront more.

Problem Scenario:  User changes their password.  User is a member of the Response Group.  Response group calls to User drop as soon as they’re picked up by User.

80 mins later after reviewing logs, traces, settings, removing/re-adding, moon phases, the light goes back on; Client Certificate.

Resolution:  Sign out the user, Remove User Certificate via the Control Panel, Sign user back in.  Poof, magic, RGS calls work again.

I’m sure if I had more time to ponder this week as to the reasoning as to why this only affects Response Group calls, as the user can make and receive direct calls, but incoming calls via RGS would drop for a user that has recently changed their password.

Feel free to leave a note below if you have some sort of divine reasoning what password changes affect clients certs in such a way to only affect RGS Calls.

 

Response Groups and Polycom VVX’s

DiagnosticsReportA recent baffling issue with a client involving Response Groups in a Skype for Business environment.  Response Group Agents, using VVX phones (both 310 and 600’s) were experiencing an issue with calls coming from other VVX phones within the company.  As with any troubleshooting case, clear scope of the issue and being able to reproduce the issue is so very key.  Naturally I had neither, just anecdotal statements that some calls are failing to Main Reception…  They only get about 200+ calls a day, no biggie.  Please give me a Date/Time of an incident, and is it reproducible.

A few days later it was discovered that it was internal calls to Main Reception that were having the issue, great, can reproduce, even externally with a Skype call.  It’s that lovely Diagnostic ID 32 “Call terminated on mid-call media failure where both endpoints are internal” or Diagnostic ID 33 “Call terminated on a mid-call media failure where one endpoint is internal and the other is remote”.  I hate these two, but in my past experience is that these two codes will have something to do with Codecs.  Sometimes you get lucky and someone misspelled the internal Edge interface DNS entry or Certificate entry, but we’ve been in production much to long for it to be that and only be reported now.  No, this was going to be a codec problem.

When reproducing the issue myself, I also noticed something very odd when calling from my VVX 600, it sometimes it looked like it was going into a Conference Call. Another odd symptom which we can see in the Monitoring Reports is that some of the calls with the RGS group show Video in the Modalities list but not always.  Main Reception had both an Agent with a VVX600 and another agent with a VVX310, occasionally Video showed up but only for the 600 agent.  So, yah, something codec related.

I apologize at this point because I didn’t have the time, or the clients patience to do a proper SIP trace and log collection, I went straight for the throat and modified the codec list on the RGS Agents phones, and my own.  I modified the Codec Priorities on the phones to look like below, the 310’s wouldn’t have the Video Codec so not to worry there.

VVX600-After

I moved G.711Mu, and G.722 to the top of the list.  If your outside of North America I believe you would move G.711A to the top of your list.  G.722.1 (24 kbps) I think is unnecessary I thought maybe it was the codec for Microsoft SIREN but I can’t tell.  G.711A I left in just cause.  I completely removed the video codec’s as they don’t work with Lync 2013 or Skype environments anyway, plus the RGS/Video Conference thing…  I don’t see a purpose for the bottom 4 audio codec’s and will be testing with them removed for awhile.  I’ll update the blog if I get any further confirmation about what codec should be in and what definitely doesn’t need to be there for Lync/Skype environments.  I’m personally testing for the next while just have G.722, G.711Mu, and G.711A, I don’t think there are any other codec’s on the VVX phones that are compatible with a Lync 2013 or Skype for Business environment.

I experienced a similar codec issue with a Telus SIP Trunk in a Lync 2013 environments and using an AudioCodes M1k.  In that case it was Telus Cell phones calling, coming through the Telus SIP trunk, and AMR (Adaptive Multi-Rate speech codec) was getting negotiated, but there was no media actually happening, or it would be one way.  Problem there was fixed by limiting the codec’s on the AudioCodes on the Telus SIP Trunk side to only G.711Mu (PCMU) or G.711A (PCMA).

Now I’m going to go backcheck a few clients who have VVX phones and look for Diagnostic ID 32 and 33 in their environments.  The rub on this is that 32 and 33 aren’t Errors, just Warnings, so you have to look for them yourself.  To find them, go into your Monitoring Reports server and click on Top Failures Report.  Change the Category from “Unexpected Failure” to “Both expected and unexpected failure”.  In the Diagnostic IDs field, enter “32,33”, like below.  Change the date range as well if you like.

DiagnosticsReport

Dang, 70 in the last month… And affects both RGS and Exchange UM calls…  But none since the VVX Provisioning policy took effect.  Speaking of VVX Provisioning, here is the line I added to modify all the VVX phones via the common CFG file:

<WEB video.codecPref.H261=”0″ video.codecPref.H263=”0″ video.codecPref.H2631998=”0″ video.codecPref.H264=”0″ voice.codecPref.G711_A=”4″ voice.codecPref.G711_Mu=”2″ voice.codecPref.G722=”1″ voice.codecPref.G7221.24kbps=”3″ voice.codecPref.G7221_C.48kbps=”6″ voice.codecPref.Siren14.48kbps=”7″ voice.codecPref.Siren22.64kbps=”5″ />

If you haven’t set the prov.polling settings, the change probably won’t happen until you reboot the phone.  I think I smell another blog post, basic VVX Provisioning server configuration settings, IF you’re not using Event Zero UC Commander for Provisioning…

Additional Notes:  I should also have made note we also 1) Rebuilt the Response Group in trying to resolve.  In fact, in making minor changes it corrupted just like in the old days, and we had no choice but to delete, wait for replication and recreate, issue persisted. 2) Installed CU-259, issue still persisted.