Big Brother is Listening To You?

  Posted by Laura Chumley on August 20, 2009

A new patent has been awarded to Charles Humble of the National Institute for Truth Verification (NITV) that establishes numerical values to stress levels experienced when lying, even using recorded speech. Read the entire article here.

The NITV markets the Computer Voice Stress Analyzer (CVSA I and CVSA II) , which purports to be 96%-98% accurate at discerning truth from fiction. It has been marketed primarily to law enforcement and military intelligence agencies, thus far. It uses an algorithm to analyze and graph frequency modulations in unstructured speech. These graphs then display “positively” whether the person has lied in response to a question.
“Micro tremors are tiny frequency modulations in the human voice. When a test subject is lying, the automatic, or involuntary nervous system, causes an inaudible increase in the Micro tremor’s frequency. The CVSA® detects, measures, and displays changes in the voice frequency.” From the web site: (http://www.cvsa1.com/CVSA.htm)

I remember when my apparently prescient mother used vibrations to test my own veracity. She would have me put my index finger in a bowl of water and answer her questions. If the water vibrated, I was lying. She swore by it, but my independent observations were that it was about 50/50—and easily manipulated. Ahem.

Other interesting ways of teasing out the truth include one near and dear to my heart—the magic donkey. (Why? See my portrait on the first blog–Jan 2008.)

…circa 500 B.C. in India. A priest put lampblack on the tail of a donkey in a dark room and all suspects were to pull the magic donkey’s tail. They were told that when the one who was the thief pulled the magic donkey’s tail, he would speak and be heard throughout the temple. The person who did not pull the tail had clean hands and was pronounced the thief and punished.

As if all of this weren’t frightening enough to the average “little white liar”, a South Korean company claims to be able to identify real vs. fake emotion. An article in Cellular News, dated 09/26/2006 says “Nemesysco’s leading technology is also powering KTF’s new ‘Love Detector’ service, which tells the caller the “love level” of the person on the other end of the line every 10 seconds - so that subscribers can tell whether their loved ones share their feelings all through the conversation. Once the call is completed, the subscriber also receives a message ranking the overall level of affection, plus graphs that measure various attributes such as level of interest, attention, expectation, and embarrassment.”

Gives a whole new sensation of terror to the question “Does this dress make me look fat?”, doesn’t it? I wonder if they caught the irony in their company name…Nemesys aka Nemesis. I did!

Supposedly, there are no known countermeasures to the CVSA truth verification methodology. I’d be interested in knowing whether a skilled character actor could deliver lines convincingly enough to fool the system. I hope so. As much I want the truth to win out over lies, there are situations where a half truth, a kind fiction, is a far, far better response than the cold, clinical, absolute truth. And I am sure that the marketing groups, political organizations and pundits of all flavors would agree wholeheartedly, eh?

The power of speech is unmistakable, inescapable. Its power for good and harm is real. Have we reached a place where we are orchestrating a version of Brave New World in which the privacy of our own mind and our heartfelt intentions are lost? What do you think? Speak up. And remember…Big Brother is listening…

Enhanced Call Progress Analysis

  Posted by Justin Simkavitz on October 14, 2008

I’m not too big on regurgitating press releases or articles but I just wanted to write a quick note and let you know about our new enhanced Call Progress Analysis (eCPA) solution for outbound IVR campaigns. The ability to accurately determine whether an outbound call was answered by a live person or machine is vital to the success of any outbound IVR initiative. Due to the ease of use and highly reliable results our eCPA feature offers, many of our customers are already leveraging this technology to increase their CPA accuracy.

Below is a short blurb from a recent article about MTI’s eCPA:

The eCPA feature from MTI utilizes sophisticated algorithms, the power of the SIP protocol, and recent SIP gateway advances in order to provide customers with a simple way to integrate highly accurate CPA capabilities into their Outbound IVR applications.

The IVR capabilities that MTI offers, including eCPA are easily accessible through Web services and they leverage MTI’s standards based, feature rich, hosted Speech IVR platform. After initiating an Outbound IVR call, analysis is performed at the platform level and the result of the call progress is returned to the application for further processing.

In the past, customers placing outbound calls in a SIP-based environment were faced with a less-than-satisfactory solution that involved placing code within their own applications. The eCPA feature ensures that the platform performs the analysis for the customer which results in greatly increased accuracy rates and decreased time to market.

Read the full article here: MTI eCPA Feature Article PDF

.NET or Java in Speech

  Posted by Lowell Clark on March 11, 2008

Recently, I had the pleasure of running into someone who called .NET “archaic” and craptastic”. It was apparent from the rest of the conversation that this wordsmith had a bias towards Java and open-source. For the rest of this blog we will call this person “John Smith”.

Disclaimer: I have limited knowledge of Java and its environments, but from what I have seen, I prefer the .NET languages and environment over Java.

So, I decided to do a little research to find out why someone would have chosen these words to describe a language and environment that I enjoy developing in. During my limited research I came across the following two blogs.

http://blogs.zdnet.com/ITFacts/?p=5890
http://rwatsh.blogspot.com/2006/12/java-vs-net.html

I am sure you all know that you can not believe everything that you read on the Internet, so I decided to hunt down one of my previous co-workers who has developed both in Java and in .NET. This previous co-worker indicated that he preferred developing in the .NET environment.

So far, I have concluded that both .NET and Java are very good and mature environments and have their own strengths and weaknesses. I also believe that the verbiage used to describe the .NET technology by John Smith is not backed by any factual information and was most likely an emotional outburst because of stress from an external source.

I am still intrigued by this topic and would like to do more research on it, but for now, I must let it go. I would like to hear the opinions from the speech community on this topic. Do you prefer .NET or Java, and why?

Disclaimer: The information, ideas, and opinions expressed in this blog are mine alone, and do not necessarily reflect those of Message Technologies, Inc.


Outbound Calls in a SIP / VOIP Environment

  Posted by Lowell Clark on February 13, 2008

When making an outbound call using an automated system, it is likely that you will want to know the status of the call. Was the line busy? Was the call answered? Was it a person or an answering machine that answered the phone?

The term Call Progress Analysis (CPA) encompasses the answer to all of these questions. For a more formal definition of CPA, see the Executive Summary Section of the “Call Progress Analysis:Global Call API Usage and Protocol Configuration”.

In past years, it was common for automated systems to use a piece of hardware to communicate with the telephony network. When an automated system placed an outbound call, the hardware would normally handle the CPA processing and return the result to the software application. Dialogic was and still is a major player in this market.

In today’s SIP/VOIP and VXML environments, things have changed a bit. It is no longer necessary to have a piece of hardware in your computer to communicate with the telephony network. Most Voice XML platform vendors provide pre-connection CPA but not post-connection CPA. You can expect to see pre-connection CPA results similar to the following:

• Busy
• Ring No Answer (RNA)
• Special Information Tones (SIT)

These results are great for creating business logic around call attempts and call back times, but what about after the call is answered?

Let’s say that I want to create an outbound campaign which targets people while they are at home and will run an interactive Voice XML speech application when the call is answered by a human. However, if the call is answered by an answering machine, I just want to leave a message on the answering machine. How will I know what answered the phone if the Voice XML platform does not provide my Voice XML application this information?

Well, determining the length of the greeting when the phone is answered is one way to identify the answering party. Since this example campaign will be calling people at their homes, you can expect that a human will answer the phone very briefly. For example: “Hello” or “Smith residence”. If an answering machine were to answer the phone then you would expect a longer greeting. For example: “Thank you for calling the Smith residence. We are currently unavailable, but if you leave your name and number we will get back with you as soon as possible”. As you can see, the answering machine is quite wordy and we can use this to our advantage.

Here is some sample code that can be used to determine if the answering party is a human or not based on the length of the greeting. I do not make any claims that this solution will provide 100% accurate results, but from my experience, neither did the hardware solutions.

The solution starts by recording the callers greeting. When the greeting is complete, the length of the recording is analyzed to determine if a human answered the phone.

This specific example indicates that if the answering parties greeting is longer than 3.5 seconds (human_threshold), then the answering party must be an answering machine. All of the attributes for the record tag can be adjusted as well as the human_threshold value to change the experience.


<?xml version="1.0"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml“>
<property name=”COM.VOICEGENIE.USECONNECTIONEVENT” value=”TRUE”/>
<var name=”IsHuman” expr=”‘true’” />
<var name=”duration” expr=”‘0′” />
<var name=”human_threshold” expr=”‘3.5′” />
<form id=”CPA”>
<record name=”recording” beep=”false” beginsilence=”3s” finalsilence=”400ms” mintime=”250ms” maxtime=”45s”>
<noinput>
<prompt> No Input </prompt>
<assign name=”IsHuman” expr=”‘false’” />
<goto next=”#message” />
</noinput>�
<filled>
<assign name=”duration” expr=”recording$.duration” />
<if cond=”duration > human_threshold”>
<assign name=”IsHuman” expr=”‘false’” />
<goto next=”#message” />
<else />
<goto next=”#message” />
</if>
</filled>
</record>
</form>
<form id=”message”>
<block>
<prompt>The recording duration was <value expr=”duration” /></prompt>
<prompt>Is Human equals <value expr=”IsHuman” /></prompt>
</block>
</form>
<catch event=”connection.disconnect.hangup”>
<log> event connection.disconnect.hangup fired</log>
<exit/>
</catch>
</vxml>

Figure 1.0 - This VXML snippet was written to run on a Voice Genie Voice XML platform (now Genesys).

This example also handles the following scenarios:

• If there is silence for the first 3 seconds of the call then a no input event will be thrown. (beginsilence)
• If the answering party’s greeting is less than 250ms then a no input event will be thrown. (mintime)
• When the answering party stops speaking for more than 400ms a filled event will be thrown. (finalsilence)
• If the answering party speaks for longer than 45 seconds a filled event will be thrown. (maxtime)

This VXML snippet was written to run on a Voice Genie Voice XML platform (now Genesys). Because of this, the following non-standard VXML attributes are used in the record tag.

beginsilence – “The time to wait, if no speech occurs, before throwing a noinput event.”
mintime – “If the duration of the recording is less than this attribute, then the recording is assumed to be empty and a noinput is thrown.”

I would love to know if anyone else has found a better or different solution to this problem.

Disclaimer: The information, ideas, and opinions expressed in this blog are mine alone, and do not necessarily reflect those of Message Technologies, Inc.