Voice Applications with VoiceXML
This section shows how to deliver VoiceXML Web applications from a
multiclient XML/XSL-based architecture. The section includes information about
the following:
Voice portals and VoiceXML
VoiceXML application architecture
Advantages and limitations of
voice access to Web applications
An example of the phonebook business
service used to illustrate service delivery with VoiceXML
Voice Portals and VoiceXML
Voice portals, specifically portals that support VoiceXML, contain the
hardware and soft-ware required to interface the public telecommunications
network to VoiceXML services on the Internet.
VoiceXML Application Input
Voice portals accept input from telephones in the form of voice and
touchtones. In order to use voice input, voice portals need to be able to
perform speech recognition. Application software can then act on the recognized
input. Speech recognition is depen-dent on application grammars that tell the
portal which sounds represent valid input. Because most VoiceXML applications
need to be speaker independent, they are usually more accurate with smaller grammars.
Voice portals contain the software and hardware needed to recognize
touchtone input. Touchtone input is useful for login and other input that must
be very accurately recog-nized. It is also a more robust alternative to voice
in noisy caller environments that can confuse speech recognition software.
VoiceXML Application Output
Voice portals deliver two kinds of output: synthesized speech and audio
playback.
Speech synthesis, also known as TTS
(text-to-speech), is the process of producing auto-mated speech from words in
text format. TTS is useful for services that output dynamic results. TTS is
also useful while developing, testing, and refining a voice Web service because
it may be changed rapidly at low cost.
Audio playback involves simply playing back a prerecorded audio file
over the telephone. This mode of output has a lower computational cost and is
therefore more suitable for sta-tic content and content that needs to be
delivered in a more natural-sounding voice.
A VoiceXML Application Architecture
The various components of the voice application architecture are
illustrated in Figure
and include the following:
Mobile user. This is a user who wants to
access voice Web applications.
Web phone. This is a mobile phone being
used to make telephone calls to voice Web
applications.
Base station. The cellular base station in
the wireless network interacts with the Web
phone via wireless network protocols.
Phone user. This is a user with a standard
landline phone accessing voice Web applications
delivered via the voice portal.
Phone. This is a standard landline
phone connected to the telecommunications
infrastructure.
Telecommunications infrastructure. This is
the global telephone network that enables
any telephone to access voice Web applications via the voice portal.
Voice portal. This is the hardware and
software gateway through which users can access
voice Web applications.
Multiclient pull architecture. This is the XML/XSL-based architecture capable of delivering Web applications to multiple types of clients,
including VoiceXML. This component is a pull architecture because clients make
requests and then wait for a server response.
Voice Portal Architecture
The components of the voice portal architecture are illustrated in
Figure 21.8 and include the following:
Communications Interface Hardware.
Specialized boards that interface the voice
portal with the telephone system and the Internet.
VoiceXML interpreter and controller. The
VoiceXML “browser” component that interfaces
the telephone user and the VoiceXML application. It retrieves VoiceXML pages
from the business Web site, interprets them, and executes them to control the
voice Web service dialog.
Text-to-speech. Converts text to speech and
delivers it to the telephone user via the
telephony hardware.
Audio playback. Plays prerecorded audio for the
telephone user via the telephony hardware.
DTMF (touchtone). Receives and interprets
touchtone signals from the telephone user.
Speech recognition. This is the input module
responsible for interpreting speech input
received from the telephone user.
Audio recording. This is the input module that
receives audio from the telephone user
and records it.
Advantages and Limitations of VoiceXML Applications
To understand what kinds of Web applications may be effectively
delivered via voice over telephones, it is useful to review the advantages and
limitations of this mode of access. In addition to the advantages outlined
previously in this chapter for mobile access in general, here are some specific
advantages of voice access:
Low cost. Users already have telephones
and service plans, so there is typically no
extra initial or sustained cost associated with accessing voice Web
applications.
High availability through pervasive coverage. The global telephone network is the
most pervasive network there is. Leveraging this network to deliver voice
Web applications maximizes service access and availability.
Eyes- and hands-free operation. Through
appropriate hands-free headsets, tele-phones enable eyes- and hands-free
operation, a requirement for many consumer and business situations.
Telephones are familiar tools.
Telephones are familiar to users, so there is less of an intimidation factor involved in using voice Web applications.
Here are some of the limitations of VoiceXML applications that you
should consider when planning and designing voice Web applications:
Audio only. Voice Web applications may
deliver audio only. This makes voice Web applications
unsuitable for services that require visual output.
New user interface paradigm. Voice
Web applications are a relatively new mode of access to the Internet. This makes the design and development as
well as the use of these new services more challenging. As a result, usability
testing and personaliza-tion are important to VoiceXML service development.
IVR stigma. Many new users equate voice Web
applications with rigid prompt/ response
Interactive Voice Response (IVR) systems. VoiceXML applications must overcome
this perception before they will be accepted into mainstream use.
The Profile of a Successful VoiceXML Application
A few characteristics of a successful VoiceXML application are outlined
in the following list to give you some insight into the types of business
services that lend themselves well to this mode of access:
Concise input (voice or touchtone
only)
Concise audio output
High-value urgent information
that is required as soon as it is available
Information required outside of
business hours or the office
Services required at multiple
locations
Services required where users
need to have eyes- and hands-free operation
Example: A Voice Phonebook Service with VoiceXML
In this example, a phonebook business service is used to illustrate
VoiceXML access to Web applications.
The goal of this sample service is for the user to retrieve a telephone
number for a con-tact and place a call to that contact.
Usage Scenario
This usage scenario outlines the chronological sequence of steps
required to realize the goal:
Access the phonebook service to
get a list of contact groups.
Select a group to view a list of
its contacts.
Select a contact to view the
details of that contact.
Select a phone number and call
the contact.
Collaboration
Each of the first three steps in the usage scenario result in a request
from the voice portal over the Internet using HTTP to the multiclient
architecture running on the busi-ness Web site.
The last step, on the other hand, simply results in the voice portal
transferring the user’s call to the phone number of the contact he has selected
to call.
The phonebook service retrieves data, selects an XSL style sheet, and
transforms the XML to VoiceXML for delivery to the voice portal. The voice
portal then interprets and executes the VoiceXML in order to conduct the voice
phonebook Web service dialog with the end user over his telephone.
Developing the Content
This section reviews the content required to drive the multiclient
XML/XSL-based architecture to deliver the voice phonebook Web service, as
discussed in the previous usage scenario.
Accessing the Service to Get a List of Contact Groups
Users access VoiceXML applications by dialing a telephone number that
connects them to the voice portal. For this example, we will assume a user has
dialed the telephone number the voice portal associates with the phonebook
service.
When a user calls the number, the voice portal loads the VoiceXML from
the URL asso-ciated with the phone number:
https://www.MyDomain.com/Phonebook.vxml
In this case, www.MyDomain.com is the domain of the business Web site providing the phonebook service.
The VoiceXML that is loaded from the initial URL is shown in Listing 21.10.
LISTING 21.10 Phonebook.vxml—The Initial VoiceXML for the Phonebook Service
<?xml version=”1.0” encoding=”UTF-8”?>
<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language
1.0//EN” ➥
”http://resources.tellme.com/toolbox/vxml-tellme.dtd”>
<vxml
application=”Phonebook.vxml”> <form id=”Introduction”>
<block>
<audio>Welcome to the X Y Z corporation phone
book.</audio> <goto next=”/servlet/Phonebook?mode=selectGroup”/>
</block>
</form>
</vxml>
The first line of the document indicates that the VoiceXML is XML 1.0
compliant and has a UTF-8 character encoding:
<?xml version=”1.0” encoding=”UTF-8”?>
The next line indicates the document type (recall that in this example
the VoiceXML is hosted by the Tellme Networks voice portal):
<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language
1.0//EN” ➥
”http://resources.tellme.com/toolbox/vxml-tellme.dtd”>
The root element of this document is the vxml element, which has an attribute named application
that specifies that this VoiceXML document belongs
to the
Phonebook.vxml application:
<vxml application=”Phonebook.vxml”>
Different VoiceXML documents that belong to the same voice application
specify the same value for this attribute. This is known as the application scope of the VoiceXML
service and is the highest-level scope in a hierarchy of scopes possible in a
VoiceXML service. VoiceXML documents within the same scope may share the same
grammars.
The form element has an id
attribute with the value Introduction:
<form id=”Introduction”>
VoiceXML documents may contain multiple forms. The id attribute of any given form may
be used to navigate to that form from either within the same VoiceXML document
or from another VoiceXML document. This form element, in turn, contains a single child block element.
The block element contains an audio element that is converted to speech as an introduc-tion to the
phonebook service:
<block>
<audio>Welcome to the X Y Z corporation phone book.</audio>
<goto next=”/servlet/Phonebook?mode=selectGroup”/>
</block>
After this introduction, the voice portal executes the other child goto element. This ele-ment instructs
the voice portal to navigate to the next VoiceXML document in the service that
may be loaded from the URL /servlet/Phonebook?mode=selectGroup.
The goto element causes the voice portal to send a request for a dynamically
generated VoiceXML document from the multiclient architecture. The phonebook
data responds to this request by generating the same XML response as in the
case of the WML Web phone client, as shown in Listing 21.1 earlier in this
chapter.
The phonebook view component identifies the client as a VoiceXML browser
and loads the XSL style sheet shown in Listing 21.11.
LISTING 21.11 GetListOfContactGroups_VXML.xsl—The XSL Used by the Phonebook
View to Transform XML into
a VoiceXML List of Contact Groups
<?xml version=”1.0” encoding=”UTF-8”?>
<xsl:stylesheet version=”1.0”
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:param
name=”servlet” select=”’undefined’”/>
<xsl:template match=”/”>
<xsl:text
disable-output-escaping=”yes”>
<![CDATA[<!DOCTYPE vxml
PUBLIC “-//Tellme Networks//Voice Markup
Language
➥ 1.0//EN”
“http://resources.tellme.com/toolbox/vxml-tellme.dtd”>]]>
</xsl:text>
<vxml
application=”Phonebook.vxml”> <menu id=”SelectGroup”>
<prompt>Please select the key on your phone
with the initial of the ➥ last name of the person to
call.</prompt>
<choice
dtmf=”1”><xsl:attribute
name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=All</xsl:attribute>
➥ all</choice>
<choice
dtmf=”2”><xsl:attribute
name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=[A-C]</xsl:attribute>
➥ (a to c)</choice>
<choice
dtmf=”3”><xsl:attribute
name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=[D-F]</xsl:attribute>
➥ (d to f)</choice>
<choice
dtmf=”4”><xsl:attribute
name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=[G-I]</xsl:attribute>
➥ (g to i)</choice>
<choice
dtmf=”5”><xsl:attribute
name=”next”><xsl:value-of
➥ select=”$servlet”/>?mode=selectContact&group=[J-L]</xsl:attribute>
➥ (j to l)</choice>
<choice
dtmf=”6”><xsl:attribute
name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=[M-O]</xsl:attribute>
➥ (m to o)</choice>
<choice dtmf=”7”><xsl:attribute name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=[P-S]</xsl:attribute>
➥ (p to s)</choice>
<choice
dtmf=”8”><xsl:attribute
name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=[T-V]</xsl:attribute>
➥ (t to v)</choice>
<choice
dtmf=”9”><xsl:attribute
name=”next”><xsl:value-of
➥
select=”$servlet”/>?mode=selectContact&group=[W-Z]</xsl:attribute>
➥ (w to z)</choice>
<catch event=”nomatch noinput
help”> <reprompt/>
</catch>
</menu>
</vxml>
</xsl:template>
</xsl:stylesheet>
The VoiceXML that is generated by transforming the XML in Listing 21.1
using the XSL in Listing 21.11 appears in Listing 21.12.
LISTING 21.12 ListOfContactGroups.vxml—The VoiceXML Response for
a List of Contact Groups
<?xml version=”1.0” encoding=”UTF-8”?>
<!DOCTYPE vxml PUBLIC
“-//Tellme Networks//Voice Markup
Language 1.0//EN”
➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”> <vxml
application=”Phonebook.vxml”>
<menu
id=”SelectGroup”>
<prompt>Please select the key on your phone
with the initial of the last ➥ name of the person to
call.</prompt>
<choice dtmf=”1” next=”/servlet/Phonebook?mode=selectContact&group=
➥ All”>all</choice>
<choice dtmf=”2”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [A-C]”>(a to c)</choice>
<choice dtmf=”3”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [D-F]”>(d to f)</choice>
<choice dtmf=”4”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [G-I]”>(g to i)</choice>
<choice dtmf=”5”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [J-L]”>(j to l)</choice>
<choice dtmf=”6” next=”/servlet/Phonebook?mode=selectContact&group=
➥ [M-O]”>(m to o)</choice>
<choice dtmf=”7”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [P-S]”>(p to s)</choice>
<choice dtmf=”8”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [T-V]”>(t to v)</choice>
<choice dtmf=”9”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [W-Z]”>(w to z)</choice>
<catch event=”nomatch noinput
help”> <reprompt/>
</catch>
</menu>
</vxml>
This VoiceXML is similar to the VoiceXML discussed for the previous
step, with some differences discussed here. The vxml root element contains one child menu element that prompts the user for some input and then interprets the
response. The id
attribute of the menu element has the value SelectGroup:
<menu id=”SelectGroup”>
VoiceXML documents may contain multiple menu elements. This id
attribute may be used to navigate to menus either within the same VoiceXML
document or in a different VoiceXML document.
The prompt child element contains text that is converted into speech by the
text-to-speech output module of the voice portal; this indicates to the user
what input is required to proceed to the next step in the dialog:
<prompt>Please select the key on your phone with the initial of
the last name ➥ of the person to
call.</prompt>
After the prompt element is a range of choice elements, each one representing a valid option in the user’s response
to the previous prompt for input. In the following XSL snip-pet, the dtmf attribute of the choice element specifies that this
option may be selected by pressing the touchtone key labeled “2” on the phone:
<choice dtmf=”2”
next=”/servlet/Phonebook?mode=selectContact&group= ➥ [A-C]”>(a to c)</choice>
Alternatively, the text child of the choice element—in this case, with the value (a to c)—indicates
that the user may say “a to c” to select this option. The next attribute of this element indicates the URL
that the voice portal should navigate to when the user selects this option. In
this case, the URL is the phonebook servlet with the HTTP GET argument mode with the value selectContact, indicating that the response
should enable the user to select a particular contact from the contact group
named [A-C], as specified by the other HTTP GET argument, named group.
The catch child element of the menu element indicates to the voice portal that certain events should be
caught and handled as specified in the content of this element. The event attribute specifies that the
events for nomatch, noinput, or help should be caught when the
user provides invalid input, no input, or asks for “help,” respectively.
The reprompt child element of the catch element indicates to the voice portal that when any of these events are
caught, the action taken should be to prompt the user for the input again, as
described previously, and then wait for another input selection:
<catch event=”nomatch noinput help”> <reprompt/>
</catch>
In a production application, these events would typically be handled
separately and in a more user friendly manner.
Selecting a Group to View a List of Its Contacts
Next, the user selects option number 2, corresponding to contacts with
last names having initials in the range [A-C]. The XML generated by the phonebook data component in response to this
request is the same as for the WML Web phone client (refer back to Listing
21.4).
The phonebook view uses the style sheet shown in Listing 21.13 to
transform the results.
LISTING 21.13 GetListOfContacts_VXML.xsl—The XSL Used by the
Phonebook View to Transform the XML
into a VoiceXML List of Contacts in a Group
<?xml version=”1.0” encoding=”UTF-8”?>
<xsl:stylesheet version=”1.0”
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:param
name=”servlet” select=”’undefined’”/>
<xsl:param
name=”group”
select=”’undefined’”/>
<xsl:param name=”lcletters”
select=”’abcdefghijklmnopqrstuvwxyz’”/> <xsl:param name=”ucletters”
select=”’ABCDEFGHIJKLMNOPQRSTUVWXYZ’”/> <xsl:template match=”/”>
<xsl:text
disable-output-escaping=”yes”>
<![CDATA[<!DOCTYPE vxml
PUBLIC “-//Tellme Networks//Voice Markup
Language
➥ 1.0//EN”
“http://resources.tellme.com/toolbox/vxml-tellme.dtd”>]]>
</xsl:text>
<vxml
application=”Phonebook.vxml”> <menu id=”SelectContact”>
<prompt>Got
<xsl:value-of
select=”count(phonebook/contact)”/>
➥ contacts. Please say
the name of
the contact you
wish to call.</prompt>
<xsl:for-each
select=”phonebook/contact”>
<choice><xsl:attribute name=”next”><xsl:value-of select=”$servlet”/>
➥ ?mode=selectNumber&group=<xsl:value-of
select=”$group”/>& ➥ contact=<xsl:value-of
select=”@id”/></xsl:attribute>(
➥ <xsl:value-of
select=”translate(name/firstname,$ucletters,
➥ $lcletters)”/><xsl:text> </xsl:text><xsl:value-of
select=”translate( ➥
name/lastname,$ucletters,$lcletters)”/>)</choice>
</xsl:for-each>
<catch event=”nomatch noinput
help”> <reprompt/>
</catch>
</menu>
</vxml>
</xsl:template>
</xsl:stylesheet>
The VoiceXML that results from this transformation is shown in Listing
21.14.
LISTING 21.14 ListOfContacts.vxml—The VoiceXML Response for
a List of Contacts in a Group
<?xml version=”1.0” encoding=”UTF-8”?>
<!DOCTYPE vxml PUBLIC
“-//Tellme Networks//Voice Markup
Language 1.0//EN”
➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”> <vxml
application=”Phonebook.vxml”>
<menu
id=”SelectContact”>
<prompt>Got 2 contacts. Please say the name
of the contact you wish to ➥ call.</prompt>
<choice next=”/servlet/Phonebook?mode=selectNumber&group=[A-C]&
➥ contact=e5678”>(joe ashworth)</choice>
<choice
next=”/servlet/Phonebook?mode=selectNumber&group=[A-C]& ➥ contact=e9921”>(bill currie)</choice>
<catch event=”nomatch noinput
help”> <reprompt/>
</catch>
</menu>
</vxml>
In this case DTMF options for the choice elements are not available. The effect of this is that the user is
required to say the name of the contact to select it. It is possible to enable
DTMF selection if the service requires it.
Each choice element’s next attribute points to the URL the voice portal should load and execute if
the user selects it. The URL is composed of the location of the phonebook view
component followed by mode, group, and contact arguments in HTTP GET syntax.
The contact argument
is assigned the value of the id
attribute of the associated contact element in the source XML being transformed. This lowercase name that
is the text child value of the choice element is effectively the grammar that indicates to the voice portal
what the user will say to select this option.
The choice element illustrates a few characteristics of grammars:
VoiceXML grammars are required to
be in lowercase.
The first and last names are
separated by a space to indicate to the voice portal that the name is two words
rather than one. This has bearing on the sounds the voice portal will expect
when the user speaks this option.
The phrase is enclosed in
parentheses to indicate to the voice portal that the user needs to speak the
first name followed by the last name for this option to be selected.
Selecting a Contact to View the Details of That Contact
We will assume that the user has selected Bill Currie in the previous
step. In this step, the user gets Bill Currie’s telephone numbers and selects
one to call him. The XML gen-erated by the phonebook data component in response
to this request is the same as in the case of the WML Web phone, as shown
previously in Listing 21.7. The XSL used to transform this XML into VoiceXML is
shown in Listing 21.15.
LISTING 21.15 GetContactDetails_VXML.xsl—The XSL Used by the
Phonebook View to Transform the XML
into VoiceXML Contact Details
<?xml version=”1.0” encoding=”UTF-8”?>
<xsl:stylesheet version=”1.0”
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:param name=”servlet”
select=”’undefined’”/>
<xsl:param
name=”group”
select=”’undefined’”/>
<xsl:param name=”lcletters”
select=”’abcdefghijklmnopqrstuvwxyz’”/> <xsl:param name=”ucletters”
select=”’ABCDEFGHIJKLMNOPQRSTUVWXYZ’”/> <xsl:template match=”/”>
<xsl:text
disable-output-escaping=”yes”>
<![CDATA[<!DOCTYPE vxml
PUBLIC “-//Tellme Networks//Voice Markup
Language
➥ 1.0//EN”
“http://resources.tellme.com/toolbox/vxml-tellme.dtd”>]]>
</xsl:text>
<vxml
application=”Phonebook.vxml”> <xsl:for-each
select=”phonebook/contact[1]”>
<menu id=”SelectNumber”>
<prompt>There are <xsl:value-of
select=”count(phone)”/> phone numbers ➥ for <xsl:value-of
select=”name/firstname”/><xsl:text> </xsl:text>
➥ <xsl:value-of select=”name/lastname”/>. Please select from the ➥ following options:
<xsl:for-each
select=”phone”>
<xsl:value-of
select=”translate(@type,$ucletters,$lcletters)”/> ➥ <xsl:text> </xsl:text>
</xsl:for-each>
.
</prompt>
<xsl:for-each
select=”phone”>
<choice><xsl:attribute name=”next”>#Call<xsl:value-of
➥ select=”@type”/></xsl:attribute><xsl:value-of
➥ select=”translate(@type,$ucletters,$lcletters)”/></choice>
</xsl:for-each>
<catch event=”nomatch noinput
help”> <reprompt/>
</catch>
</menu>
<xsl:for-each
select=”phone”> <form id=”CallHome”>
<xsl:attribute name=”id”>Call<xsl:value-of
select=”@type”/> ➥ </xsl:attribute>
<block>
<audio>Transferring call to Bill Currie at
the <xsl:value-of ➥ select=”@type”/> phone
number.</audio>
</block>
<transfer>
<xsl:attribute name=”dest”><xsl:value-of
select=”areacode”/> ➥ <xsl:value-of
select=”number”/></xsl:attribute>
</transfer>
</form>
</xsl:for-each>
</xsl:for-each>
</vxml>
</xsl:template>
</xsl:stylesheet>
The VoiceXML that results from this transformation is shown in Listing
21.16.
LISTING 21.16 ContactDetails.vxml—The VoiceXML Response for Contact Details
<?xml version=”1.0” encoding=”UTF-8”?>
<!DOCTYPE vxml PUBLIC
“-//Tellme Networks//Voice Markup
Language 1.0//EN”
➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”> <vxml
application=”Phonebook.vxml”>
<menu
id=”SelectNumber”>
<prompt>There are 2 phone numbers for Bill
Currie. Please select from the ➥ following options: work
mobile.</prompt>
<choice next=”#CallWork”>work</choice>
<choice next=”#CallMobile”>mobile</choice> <catch event=”nomatch
noinput help”>
<reprompt/>
</catch>
</menu>
<form id=”CallWork”>
<block>
<audio>Transferring call to
Bill Currie at the Work phone number.</audio> </block>
<transfer
dest=”8132367856”/> </form>
<form id=”CallMobile”>
<block>
<audio>Transferring call to Bill Currie at
the Mobile phone ➥ number.</audio>
</block>
<transfer
dest=”8139835646”/> </form>
</vxml>
This VoiceXML has one menu element, which enables the user to select the phone num-ber to call,
and two form elements—one for each phone number for the given contact. Each of the form elements serves to transfer the
caller to the associated number. The val-ues of the next attributes of the choice elements in the menu element are local URLs, each one
pointing to a form element in the same VoiceXML document with an id attribute that has a value the
same as the part of the URL after the # character. Key ele-ments used in this form are the transfer elements that, when executed by
the voice por-tal, cause it to transfer the caller to the given number. Here’s
the code:
<transfer dest=”8132367856”/>
Selecting a Phone Number to Call That Contact
When the user hears the output of Listing 21.16, he says “work” in order
to call Bill Currie at his work phone number. Control passes to the form in the same VoiceXML that has an
id attribute with the value CallWork. The user is then notified that
his call is being transferred. The call is then transferred to the number
813-236-7856. The user hears the call being transferred and then makes a
connection with Bill Currie at his work number when he picks up the phone.
VoiceXML Structure and Elements
This section briefly reviews the key elements of VoiceXML, including all
the elements used in the preceding example. For a complete detailed VoiceXML
specification, see the VoiceXML Forum (www.voicexml.org).
Figure 21.9 shows a high-level graphical view of the main structure of a
VoiceXML doc-ument. This view was derived from the VoiceXML 1.0 DTD (www.voicexml.org/ voicexml1-0.dtd).
Table 21.2 provides descriptions of the elements shown in this figure.
TABLE 21.2 VoiceXML Elements with
Descriptions
Element : Description
assign : Assigns a value to a variable that exists in the state
maintained by the voice portal for the caller’s session.
audio : Outputs some audio. The output can be either a prerecorded audio
clip (for example, in the form of a WAV file) or in the form of synthesized
speech generated from the text child of this element.
block : A container of procedural statements executed in sequence from
first to last.
break : Inserts a pause in the speech output of a duration in
milliseconds specified using an attribute.
catch : Catches an event either always or on some specified condition.
choice : Defines a menu item, including both the touchtone or speech input
that may be used to select the choice and the URL to transfer control to upon
selection of the choice.
clear : Resets one or more form item variables by setting their values
to undefined.
disconnect : Disconnects a session, causing the voice portal to hang up
the call from the user.
div : Specifies that the enclosed text is of a particular type (for
example, a sentence or paragraph).
dtmf : Specifies a touchtone key grammar that serves as a set of valid
phone key input options.
else : Used optionally in combination with if elements in conditional
logic that may depend, for example, on the value of a variable.
elseif : Used optionally in combination with if elements in conditional
logic that may depend, for example, on the value of a variable.
emp : Indicates that the enclosed text should be spoken with emphasis.
enumerate : Shorthand for automatically enumerating the choices
available in a menu.
error : Catches an error event. Shorthand for a specific type of catch
element that catches events of the error type.
exit : Exits a session by terminating all loaded VoiceXML documents and
returning control to the interpreter.
field : Declares an input field in a form to get a user selection.
filled : An action executed when a user provides recognized input for a field.
form : A dialog for presenting information and collecting data from user
input.
goto : Transfers execution to another form, dialog, or document.
grammar : Encloses a speech-recognition grammar that consists of a set
of valid spoken inputs and the associated values that describe each option.
help : Catches a help event. Shorthand for a specific type of catch
element that catches events of the help type.
if : Encloses conditional logic that may be executed, depending on the
value of a variable, for example.
initial : Declares initial logic upon entry into a (mixed-initiative)
form. In a mixedinitiative form, both the caller and the voice portal direct
the conversation.
link : Specifies a transition common to all dialogs in the link’s scope.
menu : A dialog for prompting the user and enabling her to select from a
range of choices.
meta : Enables specification of data about the document.
noinput : Catches a noinput event (an event that occurs when no response
is received from the user when expected).
nomatch : Catches a nomatch event (an event that occurs when a response
is received from the user but is not recognized as valid).
object : Invokes a platform-specific object with parameters (for
example, a speaker verification object).
option : Specifies an option in a field, including the DTMF and/or
speech required for the user to select the option as well as the value to
assign to the field variable when the selection is made. Similar to the choice
element for menus.
param : Used to specify name/value parameter pairs that are passed into
object or subdialog.
prompt : Outputs synthesized speech or prerecorded audio to the user and
then waits for a user response.
property : Sets the value of a property that controls the platform
behavior (for example, timeouts).
pros : Specifies prosodic information about the enclosed text.
record : Records an audio sample and stores it in a field item variable.
reprompt : Plays a field prompt again (for example, when a field is
revisited after a nomatch or noinput event).
return : Returns from a subdialog. This is similar in concept to a
return from a function call in procedural logic.
sayas : Specifies how a word or phrase should be spoken. This enables
finer control over the text-to-speech output.
script : Specifies a block of ECMAScript client-side scripting logic
that will run on the voice portal.
subdialog : Invokes another dialog as a subdialog of the current one.
This is similar in concept to a function call in procedural logic. It returns
an ECMAScript object as the result of the subdialog.
submit : Submits values to the business Web site providing the voice Web
applications.
throw : Throws an event that may be either a predefined event or an
application- specific event.
transfer : Transfers the caller to another telephone number.
value : Inserts the value of an expression in a prompt (for example, a
variable value).
var : Declares a variable and optionally assigns it a value.
vxml : The top-level root element in each VoiceXML document.
Development Primer
This primer provides important strategies for designing voice Web
applications with VoiceXML.
Tips and Pitfalls for VoiceXML Development
The following subsections cover a few common tips and pitfalls
concerning the develop-ment of VoiceXML services.
Usability Testing and Setting Expectations
Voice portals driven by VoiceXML represent a new paradigm in delivering
Web applica-tions to users. In order to win user acceptance and be successful
in meeting business needs, voice Web applications must be thoroughly tested not
only for functionally but also from a usability standpoint. Engaging end users
early and often for usability testing also helps set their expectations for the
final service, thus easing their acceptance of the deployed result.
Voice Service Robustness
Voice Web service interfaces are limited to audio interaction with the
user. The user’s ability to detect and correct problems with a voice Web
service is therefore relatively limited when compared, for example, to a visual
interface such as a Web browser.
Consequently, in order to ensure that voice Web applications are robust
enough to meet the needs of mission-critical enterprise systems, they must be
able to gracefully handle a range of exceptions and error conditions. This
includes, in particular, missing or invalid user input, help requests from
users, and various system errors (for example, problems with the network
connectivity between the voice portal and the business Web site deliver-ing the
voice Web applications).
Getting Started
A growing number of voice portals on the Web provide excellent services
for developing, testing, and hosting deployed voice Web applications. These
include but are not limited to the following:
Tellme Networks (www.tellme.com)
BeVocal (www.bevocal.com)
VoiceGenie (voicegenie.com)
Voxeo (www.voxeo.com)
Future VoiceXML Developments
Any system intended to live for more than a few years should be designed
with sufficient flexibility to accommodate future changes. Although all changes
cannot be anticipated, some can. These expected future developments should be
used to stress-test any design to ensure that it can adapt to meet future
changes.
Mainstream Use of Voice-Over-IP (VoIP)
In addition to telephones, Voice-over-IP (VoIP) can also be used to
access voice Web applications via voice portals. In this case, the client side
is a PC with speakers and a microphone, for example, and is connected to the
voice portal over the Internet. As VoIP becomes a more popular method of
communication, voice portals will seamlessly
adapt to this new method of accessing voice Web applications. From the
multiclient architecture standpoint, there will be no apparent difference
between telephone clients or Voice-over-IP clients, except perhaps in the lack
of a caller’s phone number in the voice portal session.
Multimode Voice and Data Services
With new wireless networks that enable concurrent voice and data, new
services will emerge that present hybrid voice and data interfaces× for example, interfaces that enable users to ask for directions and
have the directions returned in a list that is cached on the client so that
users can refer to it step by step. SMIL is a standard overseen by the W3C
that’s an XML markup language that promises to coordinate such multimedia
interfaces. SMIL may be easily generated from the multiclient XML/XSL-based
architecture, enabling it to seamlessly adapt to deliver these new hybrid
multimode services when they appear.
Advanced Voice Processing on the Client Side
As telephone and other types of clients gain more computational power,
there will be a shift as more of the voice-processing capability goes to the
client side. With this trend, we can expect to see such clients start accepting
content that drives their voice capabili-ties, just as VoiceXML drives voice
portals today. This is good in that it reduces the load on the voice portal
while improving the client response time. More voice handling on the client
side also enables greater client privacy for certain applications because the
audio does not have to propagate over a network to be interpreted. This trend
can already be seen in the new advanced voice command functionality that is
appearing in some higher-end mobile phones as well as in navigation systems
appearing in cars.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.