Outbound Calls in a SIP / VOIP Environment

  Posted by Lowell Clark on February 13, 2008

When making an outbound call using an automated system, it is likely that you will want to know the status of the call. Was the line busy? Was the call answered? Was it a person or an answering machine that answered the phone?

The term Call Progress Analysis (CPA) encompasses the answer to all of these questions. For a more formal definition of CPA, see the Executive Summary Section of the “Call Progress Analysis:Global Call API Usage and Protocol Configuration”.

In past years, it was common for automated systems to use a piece of hardware to communicate with the telephony network. When an automated system placed an outbound call, the hardware would normally handle the CPA processing and return the result to the software application. Dialogic was and still is a major player in this market.

In today’s SIP/VOIP and VXML environments, things have changed a bit. It is no longer necessary to have a piece of hardware in your computer to communicate with the telephony network. Most Voice XML platform vendors provide pre-connection CPA but not post-connection CPA. You can expect to see pre-connection CPA results similar to the following:

• Busy
• Ring No Answer (RNA)
• Special Information Tones (SIT)

These results are great for creating business logic around call attempts and call back times, but what about after the call is answered?

Let’s say that I want to create an outbound campaign which targets people while they are at home and will run an interactive Voice XML speech application when the call is answered by a human. However, if the call is answered by an answering machine, I just want to leave a message on the answering machine. How will I know what answered the phone if the Voice XML platform does not provide my Voice XML application this information?

Well, determining the length of the greeting when the phone is answered is one way to identify the answering party. Since this example campaign will be calling people at their homes, you can expect that a human will answer the phone very briefly. For example: “Hello” or “Smith residence”. If an answering machine were to answer the phone then you would expect a longer greeting. For example: “Thank you for calling the Smith residence. We are currently unavailable, but if you leave your name and number we will get back with you as soon as possible”. As you can see, the answering machine is quite wordy and we can use this to our advantage.

Here is some sample code that can be used to determine if the answering party is a human or not based on the length of the greeting. I do not make any claims that this solution will provide 100% accurate results, but from my experience, neither did the hardware solutions.

The solution starts by recording the callers greeting. When the greeting is complete, the length of the recording is analyzed to determine if a human answered the phone.

This specific example indicates that if the answering parties greeting is longer than 3.5 seconds (human_threshold), then the answering party must be an answering machine. All of the attributes for the record tag can be adjusted as well as the human_threshold value to change the experience.


<?xml version="1.0"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml“>
<property name=”COM.VOICEGENIE.USECONNECTIONEVENT” value=”TRUE”/>
<var name=”IsHuman” expr=”‘true’” />
<var name=”duration” expr=”‘0′” />
<var name=”human_threshold” expr=”‘3.5′” />
<form id=”CPA”>
<record name=”recording” beep=”false” beginsilence=”3s” finalsilence=”400ms” mintime=”250ms” maxtime=”45s”>
<noinput>
<prompt> No Input </prompt>
<assign name=”IsHuman” expr=”‘false’” />
<goto next=”#message” />
</noinput>�
<filled>
<assign name=”duration” expr=”recording$.duration” />
<if cond=”duration > human_threshold”>
<assign name=”IsHuman” expr=”‘false’” />
<goto next=”#message” />
<else />
<goto next=”#message” />
</if>
</filled>
</record>
</form>
<form id=”message”>
<block>
<prompt>The recording duration was <value expr=”duration” /></prompt>
<prompt>Is Human equals <value expr=”IsHuman” /></prompt>
</block>
</form>
<catch event=”connection.disconnect.hangup”>
<log> event connection.disconnect.hangup fired</log>
<exit/>
</catch>
</vxml>

Figure 1.0 - This VXML snippet was written to run on a Voice Genie Voice XML platform (now Genesys).

This example also handles the following scenarios:

• If there is silence for the first 3 seconds of the call then a no input event will be thrown. (beginsilence)
• If the answering party’s greeting is less than 250ms then a no input event will be thrown. (mintime)
• When the answering party stops speaking for more than 400ms a filled event will be thrown. (finalsilence)
• If the answering party speaks for longer than 45 seconds a filled event will be thrown. (maxtime)

This VXML snippet was written to run on a Voice Genie Voice XML platform (now Genesys). Because of this, the following non-standard VXML attributes are used in the record tag.

beginsilence – “The time to wait, if no speech occurs, before throwing a noinput event.”
mintime – “If the duration of the recording is less than this attribute, then the recording is assumed to be empty and a noinput is thrown.”

I would love to know if anyone else has found a better or different solution to this problem.

Disclaimer: The information, ideas, and opinions expressed in this blog are mine alone, and do not necessarily reflect those of Message Technologies, Inc.

No Comments yet »

RSS feed for comments on this post. TrackBack URI

Leave a comment