|
||||||
|
|
GetHuman loses its founder, and its focus
I see that Paul English, of GetHuman.com fame, or perhaps notoriety is a better term, is throwing in the towel. Remember him? Only a few years ago he led the angry mob in a fight against the gnarly evils of telephone automation, providing a list of ways to get around the IVR and to a human agent. Today he is apparently too busy to continue to champion the movement he started, and has turned it over to Walt Tetschner, a self-styled ASR specialist and industry curmudgeon. Walt publishes an online newsletter with slightly whimsical pans and plugs of IVR applications, as well as well researched articles on events and trends in the speech technology arena. I first ran across him when I was reeling from an encounter of the hideous kind using a Social Security Administration self-service application. Hapless me, I just wanted to find out how to change my social security number to my married name. After 20 minutes of fumbled repeated attempts, I gave up and drove 45 minutes to the nearest office. It was a waste of my time and energy. And such frustration! I am well schooled in my IVR responses. They are crisp and without disfluencies. But I was stuck in a revolving nightmare of broken steps, recursive paths, illogical phrasing and overwhelming bureaucratic traps. Walt gave it a less stinging review than I would have, but overall, had the same negative perception of the experience that I did. Since then I have read Walt’s posts in various forums. It will be interesting to see what he does with GetHuman. And whether his vinegar rather than honey approach invigorates or alienates the VUI design standard movement.
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon 300 Knot Club
Well, yesterday it happened. I was cruising from Dallas to Atlanta at 17,000 feet, West of Atlanta near the Vulcan (VUZ) vortac. I picked up a hefty tailwind and managed to squeak out a 300 knot ground speed. That’s about 345mph. That’s a first for this plane! The picture shows the ground speed on the Garmin G1000 MFD; it’s a little out of focus due to a little chop. Fast is fun. ‘Nuff said.
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon Dewey Decimal in MMVIII (that’s Latin for 2008)
Take a moment and picture a librarian. Did you picture a young, hip person with a cell phone in one hand and an iPod in the other? Probably not. Traditionally, libraries have been slow adopters of new technology. The Dewey Decimal system is still widely used in libraries today despite being 130 years old. A recent trip to my local library has me convinced that you need the assistance of a helpful librarian if you want to find a book in less than an hour. Things in the library world are finally starting to change due to increased public pressure to update antiquated technology. Libraries are now taking some practical steps to improve customer service. Many public libraries have moved at glacier speed when it comes to updating technology, but glaciers are melting much faster nowadays. Libraries are no different than any other business that adopts a new process or technology, growing pains are inevitable. My trip to the library reminded me of an article I read last year. As I recall, the library had recently deployed an automated IVR system that would place outbound telephone calls to remind people when books were past due. The gentleman in the article received a call from the local library and the message went something like “Hello this is the Bumble County Public Library. Judy The 1000th Dixon, our records indicate that Gone With the Wind is past due….” It is important to note that Mr. Dixon’s wife’s full name is Judy Melissa Dixon. In the library database, her name is probably stored as Judy M Dixon. Just in case you are not from ancient Rome, recall that M is the Latin symbol for 1000. One of four things is going on here:
Text to speech (TTS) technology has improved significantly in recent years with companies like Nuance providing cutting edge technology that greatly improves the user experience. In the library example, the use of TTS to read the name is completely justified because the system is reading dynamic text and it probably is not feasible to have every name recorded by a professional voice talent. The Roman Numeral/ Middle Initial problem has many resolutions:
When designing applications that use TTS technology, it is important to know how your TTS engine will behave in different scenarios. Often times the difference between a correct rendering of the text and a “bug” is a period or space. One TTS engine may read Judy M. Dixon as “Judy M Dixon” while another will read the text as “Judy the 1000th Dixon”. Until the system is fixed, anyone in Bumble County with the middle initials I, V, L, X, C, D or M may want to avoid checking out books.
The opinions expressed in this blog are purely and personally those of myself, Justin; they are not the official views of Message Technologies.
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon My Intro
Welcome to my first post! I’m Mark Abramson, CEO/CTO and co-founder of Message Technologies, Inc. (MTI). My goal will be to discuss what’s happening in my professional life, which spans about 37 years (unbelievable) and includes over 35 years of practical experience using speech recognition, text-to-speech, and touch-tone systems. Yes, the technology has been around quite a while and I’ve seen and done a lot with it. I want to include observations and perhaps a few “pearls” I’ve discovered over the years of working with this technology and the people who develop, deploy and tweak it. I guess I am officially a serial entrepreneur, although I wouldn’t call what I do “rapid succession.” I’ve been involved in six or so startups, made money on a few and broke even on a few. Overall, my track record has been good and my instincts have proven more right than wrong. My blog may occasionally include some aviation items, since I am an avid private pilot flying a 2007 Columbia (now Cessna) 400SX. I’ve logged about 1,100 hours so far. For those who like alphabet soup, I am officially a PP, ASEL, AMEL, IA. I have high performance, complex, and tailwheel endorsements and I’ve had a little experience flying aerobatics. All this means I normally can fly single and multi-engine land-based aircraft in the clouds. One day I hope to get my seaplane rating. So I have two passions: work and flying. Happy blogging. P.S. I guess I need to deliver the standard disclaimer stuff like the opinions in my blog are my own and not that of my company. I assume full responsibility for the content of my blog and if you take issue with anything I write, please take it up with me.
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon What Lies Beneath
New and old VUI designers alike are always looking for tips on how to improve their scripting. As with any other field of endeavor, there are conflicting opinions; dissonance and debate abound. In my experience, we are passionate in our arguments, intense in our rationalizations. Design isn’t just a dry, analytical laying out of the prompts; it is an emotional interweaving of technique and form, nuance and balance. I like being a part of such a group, artists working their magic, taking words and sound and crafting a personal interaction with the caller–Giuseppe creating Pinocchio, a real boy. This, of course, requires two things—that the customer allows free and open discussion and implementation of the information given and paths chosen, and that we as designers keep our hearts and minds open to the evolving sophistication and needs of our target population. On the customer end, there has to be give and take between the demands of marketing, branding, business requirements and usability. I’ll say it outright; there needs to be far more giving and far less taking than usually happens. VUI designers are often brought in after the initial requirements gathering has happened. Well meaning folks with expertise in other specialties within the customer’s company have already laid out scripting rules and language based on experience gleaned from bad or banal interfaces in the past, ensuring that more such experiences follow for the rest of us. Like lemmings, we are forced to continue that flight over the cliff of bad decisions into the sea of bad design. And honestly, we ourselves have gotten into the ill conceived habit of using these same tired gambits over and over. Knowing so much better we are yet the worst offenders–whether by sin of commission or omission–we let ourselves be drawn down the paths of convenience, conformity, laziness and acquiescence. Bruce Balentine of EIG posted the following in the Yahoo VUID group on 02/04/2008. He points the finger directly, and appropriately, at us. “…I ascribe it to the somewhat small population of companies and people doing the implementation work. Since everyone used to work for someone else and “this is the way we did it then,” these kinds of ideas get inbred and then become dogma. It’s a kind of “convergence to a local minimum” like in neural networks or quantum systems. It takes energy to tunnel back out once we’ve converged. I think the same thing is true of those unhelpful recovery techniques that continue to persist — “I didn’t recognize that, I didn’t get that, I didn’t hear you;” — and the exclamatory grounding expressions, “Got it! and “Great!” What happens is that everyone’s ear becomes accustomed to the sound of a given solution, and in the absence of any rigorous debate or viable alternative, it becomes “comfortable” and subsequently “invisible” to the design team’s ears. “This is just how these things sound, and we used to work for XYZ so we know best by definition, and these other proposed solutions sound a little “weird” or offputting — they couldn’t possibly be an improvement.” So our designs converge to a local minimum and it’s very hard to tunnel out…” That same Yahoo VUID group has been grousing over these issues of late. Some of them I have been guilty of myself, shamefully. We have compiled a list of phrases never to be heard in modern, professional interfaces again. Let’s band together and make it happen, I say! 1 Please listen carefully as our menu options have changed. 2 For more information, please see our website at www.whatever.com. 3 Your call may be recorded for quality assurance purposes. 4 My name is Beth, your virtual agent. 5 Press 1 for English. 6 You can speak or press your answers to each question. 7 Sales pitches. 8 Menu options that go on and on. 9 Lengthy legal disclaimers. 10 It’s my fault, I’m sorry. And I am sure you can think of more offenders. When you are writing your script, eliminate the non-informational bits that interfere with the primary aim of the caller: to accomplish the task in the shortest possible, easiest way that she can. The virtue of automation is tarnished by embellishment. Self-service, like the drive-in window, should be fast, efficient and painless.
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon Outbound Calls in a SIP / VOIP Environment
When making an outbound call using an automated system, it is likely that you will want to know the status of the call. Was the line busy? Was the call answered? Was it a person or an answering machine that answered the phone? The term Call Progress Analysis (CPA) encompasses the answer to all of these questions. For a more formal definition of CPA, see the Executive Summary Section of the “Call Progress Analysis:Global Call API Usage and Protocol Configuration”. In past years, it was common for automated systems to use a piece of hardware to communicate with the telephony network. When an automated system placed an outbound call, the hardware would normally handle the CPA processing and return the result to the software application. Dialogic was and still is a major player in this market. In today’s SIP/VOIP and VXML environments, things have changed a bit. It is no longer necessary to have a piece of hardware in your computer to communicate with the telephony network. Most Voice XML platform vendors provide pre-connection CPA but not post-connection CPA. You can expect to see pre-connection CPA results similar to the following:
These results are great for creating business logic around call attempts and call back times, but what about after the call is answered? Let’s say that I want to create an outbound campaign which targets people while they are at home and will run an interactive Voice XML speech application when the call is answered by a human. However, if the call is answered by an answering machine, I just want to leave a message on the answering machine. How will I know what answered the phone if the Voice XML platform does not provide my Voice XML application this information? Well, determining the length of the greeting when the phone is answered is one way to identify the answering party. Since this example campaign will be calling people at their homes, you can expect that a human will answer the phone very briefly. For example: “Hello” or “Smith residence”. If an answering machine were to answer the phone then you would expect a longer greeting. For example: “Thank you for calling the Smith residence. We are currently unavailable, but if you leave your name and number we will get back with you as soon as possible”. As you can see, the answering machine is quite wordy and we can use this to our advantage. Here is some sample code that can be used to determine if the answering party is a human or not based on the length of the greeting. I do not make any claims that this solution will provide 100% accurate results, but from my experience, neither did the hardware solutions. The solution starts by recording the callers greeting. When the greeting is complete, the length of the recording is analyzed to determine if a human answered the phone. This specific example indicates that if the answering parties greeting is longer than 3.5 seconds (human_threshold), then the answering party must be an answering machine. All of the attributes for the record tag can be adjusted as well as the human_threshold value to change the experience.
Figure 1.0 - This VXML snippet was written to run on a Voice Genie Voice XML platform (now Genesys). This example also handles the following scenarios:
This VXML snippet was written to run on a Voice Genie Voice XML platform (now Genesys). Because of this, the following non-standard VXML attributes are used in the record tag.
I would love to know if anyone else has found a better or different solution to this problem. Disclaimer: The information, ideas, and opinions expressed in this blog are mine alone, and do not necessarily reflect those of Message Technologies, Inc.
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon Dawn of the WUI
Recently I read an article speculating that we would soon have identified the elements of canine speech. Yes, the secrets of doggie language have been revealed to us mere mortals. As the proud owner of three good looking and above average intelligence pups; I began to think about how we could now communicate, and what that might mean for our household. For example, should Mr. Buck notice that Milky Way’s cough has returned, he can immediately call the vet for a prednisone refill. Imagine…a WUI (Woof User Interface)… Virtual Vet: Hello, thanks for calling Cherokee Animal Hospital. To continue in Canine, just say woof. And with a stunning 43% accuracy rate, the woof recognition software would be comparable to speech recognition only a few years ago. How far we have come in such a short time! We started dabbling with speech recognition in the 90’s. It was dreadful. But soooo intriguing. DragonSpeak 1.0 required hours of training—not just for the program to learn your voice, but for you to learn how to speak in a way it would recognize, for you to learn how to behave. Quirky. Unpredictable. Inaccurate. Slow. Today DragonSpeak 9.0 boasts 99% accuracy. 99%! Now our esteemed CEO often uses it for casual email as well as contracts. Just kidding, Mark. Just email and white papers. And we have built a thriving VXML hosting business with enterprise level Genesys servers whose recognition capabilities will knock your socks off! A 2007 presentation at The Radiological Society of North America stated that their research found that ASR (automated speech recognition) programs have exceeded the accuracy of human translation. Yes, recognizing and interpreting human speech was done better by a machine than a human. Read the article here. The best is yet to come. And we are ready! Alright you enterprising entrepreneurs out there…who is going to hire me to write the first WUI?
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon One man’s hair is another man’s harrow.
My mother spoke like Scarlett O’Hara, with an elegant, deep Southern drawl. She was exceedingly proud to be a 4th generation Atlantan, and her life was steeped in that tradition–charm and drama, drama and charm. While she was not exactly a Luddite; she would eschew most things that smacked of modern technology. She accepted ball point pens only grudgingly, preferring the smooth ink spread of the fountain pen. She dreamed of debutante balls and ladies club meetings, magnolia perfumed encounters and genteel discourse. So of course, she bore a changeling—a redneck geek. We were taught to speak precisely even as small children, with good grammar and crisp diction. So when I told my new nephew-in-law what I do for a living–script design for speech enabled applications–I was taken aback when he said “Oh, that is why you talk so funny!”, and then he blushed, stammering, “I mean, all proper sounding.” I talk funny? Man, the hillbilly family I recently married into is the one that talks funny! I am learning a whole new language these days, using immersion techniques—do or die! Sure, they grew up only 50 miles from where I did, but believe me, there is as distinct a difference between my urban dialect and their rural one that it is as if we lived in different countries on opposite sides of the world! Most of the time, I can now understand my husband now without asking him to repeat himself, at least too often. But the other day when we were removing the bush hog implement from the John Deere tractor, he told me to get the cutting hairs and put it on. OK, I figured I would find something that looked like a tangle of wires or something. I looked and I looked. Nothing fit the bill. Nor did I understand what value something like that would have for the garden, but hey, he is the farmer, so I tried. “Hon? I don’t see it. “ I looked over there. Nope. Lots of different attachable things, but no hair-like things. “Ummmmm.” He looked over at me with some impatience and indicated a long bar with large scalloped shaped discs in shortly spaced intervals. I smiled politely and dragged the thing over to the tractor. “And this is called…what?” “Cuttin’ hare.” Then he realized what was wrong, and yet once again started to laugh at me. “Tractor harrow. We call it a hare ‘round heah.” Ah, another illuminating moment. One man’s hair is another man’s harrow.
Share: del.icio.us
| Digg it
| Furl | Google | Netscape | reddit | StumbleUpon |
![]() WordPress Custom Web Design by BeersDesign.com |