Chapter: XML and Web Services : Applied XML : Delivering Wireless and Voice Services with XML

Voice Applications with VoiceXML

This section shows how to deliver VoiceXML Web applications from a multiclient XML/XSL-based architecture. The section includes information about the following:

Voice Applications with VoiceXML

This section shows how to deliver VoiceXML Web applications from a multiclient XML/XSL-based architecture. The section includes information about the following:

Voice portals and VoiceXML

VoiceXML application architecture

Advantages and limitations of voice access to Web applications

An example of the phonebook business service used to illustrate service delivery with VoiceXML

Voice Portals and VoiceXML

Voice portals, specifically portals that support VoiceXML, contain the hardware and soft-ware required to interface the public telecommunications network to VoiceXML services on the Internet.

VoiceXML Application Input

Voice portals accept input from telephones in the form of voice and touchtones. In order to use voice input, voice portals need to be able to perform speech recognition. Application software can then act on the recognized input. Speech recognition is depen-dent on application grammars that tell the portal which sounds represent valid input. Because most VoiceXML applications need to be speaker independent, they are usually more accurate with smaller grammars.

Voice portals contain the software and hardware needed to recognize touchtone input. Touchtone input is useful for login and other input that must be very accurately recog-nized. It is also a more robust alternative to voice in noisy caller environments that can confuse speech recognition software.

VoiceXML Application Output

Voice portals deliver two kinds of output: synthesized speech and audio playback.

Speech synthesis, also known as TTS (text-to-speech), is the process of producing auto-mated speech from words in text format. TTS is useful for services that output dynamic results. TTS is also useful while developing, testing, and refining a voice Web service because it may be changed rapidly at low cost.

Audio playback involves simply playing back a prerecorded audio file over the telephone. This mode of output has a lower computational cost and is therefore more suitable for sta-tic content and content that needs to be delivered in a more natural-sounding voice.

A VoiceXML Application Architecture

The various components of the voice application architecture are illustrated in Figure

and include the following:

Mobile user. This is a user who wants to access voice Web applications.

Web phone. This is a mobile phone being used to make telephone calls to voice Web applications.

Base station. The cellular base station in the wireless network interacts with the Web phone via wireless network protocols.

Phone user. This is a user with a standard landline phone accessing voice Web applications delivered via the voice portal.

Phone. This is a standard landline phone connected to the telecommunications infrastructure.

Telecommunications infrastructure. This is the global telephone network that enables any telephone to access voice Web applications via the voice portal.

Voice portal. This is the hardware and software gateway through which users can access voice Web applications.

Multiclient pull architecture. This is the XML/XSL-based architecture capable of delivering Web applications to multiple types of clients, including VoiceXML. This component is a pull architecture because clients make requests and then wait for a server response.

Voice Portal Architecture

The components of the voice portal architecture are illustrated in Figure 21.8 and include the following:

Communications Interface Hardware. Specialized boards that interface the voice portal with the telephone system and the Internet.

VoiceXML interpreter and controller. The VoiceXML “browser” component that interfaces the telephone user and the VoiceXML application. It retrieves VoiceXML pages from the business Web site, interprets them, and executes them to control the voice Web service dialog.

Text-to-speech. Converts text to speech and delivers it to the telephone user via the telephony hardware.

Audio playback. Plays prerecorded audio for the telephone user via the telephony hardware.

DTMF (touchtone). Receives and interprets touchtone signals from the telephone user.

Speech recognition. This is the input module responsible for interpreting speech input received from the telephone user.

Audio recording. This is the input module that receives audio from the telephone user and records it.

Advantages and Limitations of VoiceXML Applications

To understand what kinds of Web applications may be effectively delivered via voice over telephones, it is useful to review the advantages and limitations of this mode of access. In addition to the advantages outlined previously in this chapter for mobile access in general, here are some specific advantages of voice access:

Low cost. Users already have telephones and service plans, so there is typically no extra initial or sustained cost associated with accessing voice Web applications.

High availability through pervasive coverage. The global telephone network is the most pervasive network there is. Leveraging this network to deliver voice Web applications maximizes service access and availability.

Eyes- and hands-free operation. Through appropriate hands-free headsets, tele-phones enable eyes- and hands-free operation, a requirement for many consumer and business situations.

Telephones are familiar tools. Telephones are familiar to users, so there is less of an intimidation factor involved in using voice Web applications.

Here are some of the limitations of VoiceXML applications that you should consider when planning and designing voice Web applications:

Audio only. Voice Web applications may deliver audio only. This makes voice Web applications unsuitable for services that require visual output.

New user interface paradigm. Voice Web applications are a relatively new mode of access to the Internet. This makes the design and development as well as the use of these new services more challenging. As a result, usability testing and personaliza-tion are important to VoiceXML service development.

IVR stigma. Many new users equate voice Web applications with rigid prompt/ response Interactive Voice Response (IVR) systems. VoiceXML applications must overcome this perception before they will be accepted into mainstream use.

The Profile of a Successful VoiceXML Application

A few characteristics of a successful VoiceXML application are outlined in the following list to give you some insight into the types of business services that lend themselves well to this mode of access:

Concise input (voice or touchtone only)

Concise audio output

High-value urgent information that is required as soon as it is available

Information required outside of business hours or the office

Services required at multiple locations

Services required where users need to have eyes- and hands-free operation

Example: A Voice Phonebook Service with VoiceXML

In this example, a phonebook business service is used to illustrate VoiceXML access to Web applications.

The goal of this sample service is for the user to retrieve a telephone number for a con-tact and place a call to that contact.

Usage Scenario

This usage scenario outlines the chronological sequence of steps required to realize the goal:

Access the phonebook service to get a list of contact groups.

Select a group to view a list of its contacts.

Select a contact to view the details of that contact.

Select a phone number and call the contact.

Collaboration

Each of the first three steps in the usage scenario result in a request from the voice portal over the Internet using HTTP to the multiclient architecture running on the busi-ness Web site.

The last step, on the other hand, simply results in the voice portal transferring the user’s call to the phone number of the contact he has selected to call.

The phonebook service retrieves data, selects an XSL style sheet, and transforms the XML to VoiceXML for delivery to the voice portal. The voice portal then interprets and executes the VoiceXML in order to conduct the voice phonebook Web service dialog with the end user over his telephone.

Developing the Content

This section reviews the content required to drive the multiclient XML/XSL-based architecture to deliver the voice phonebook Web service, as discussed in the previous usage scenario.

Accessing the Service to Get a List of Contact Groups

Users access VoiceXML applications by dialing a telephone number that connects them to the voice portal. For this example, we will assume a user has dialed the telephone number the voice portal associates with the phonebook service.

When a user calls the number, the voice portal loads the VoiceXML from the URL asso-ciated with the phone number:

https://www.MyDomain.com/Phonebook.vxml

In this case, www.MyDomain.com is the domain of the business Web site providing the phonebook service. The VoiceXML that is loaded from the initial URL is shown in Listing 21.10.

LISTING 21.10 Phonebook.vxml—The Initial VoiceXML for the Phonebook Service

<?xml version=”1.0” encoding=”UTF-8”?>

<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language 1.0//EN” ➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”>

<block>

<audio>Welcome to the X Y Z corporation phone book.</audio> <goto next=”/servlet/Phonebook?mode=selectGroup”/>

</block>

</form>

</vxml>

The first line of the document indicates that the VoiceXML is XML 1.0 compliant and has a UTF-8 character encoding:

<?xml version=”1.0” encoding=”UTF-8”?>

The next line indicates the document type (recall that in this example the VoiceXML is hosted by the Tellme Networks voice portal):

<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language 1.0//EN” ➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”>

The root element of this document is the vxml element, which has an attribute named application that specifies that this VoiceXML document belongs to the

Phonebook.vxml application:

Different VoiceXML documents that belong to the same voice application specify the same value for this attribute. This is known as the application scope of the VoiceXML service and is the highest-level scope in a hierarchy of scopes possible in a VoiceXML service. VoiceXML documents within the same scope may share the same grammars.

The form element has an id attribute with the value Introduction:

VoiceXML documents may contain multiple forms. The id attribute of any given form may be used to navigate to that form from either within the same VoiceXML document or from another VoiceXML document. This form element, in turn, contains a single child block element.

The block element contains an audio element that is converted to speech as an introduc-tion to the phonebook service:

<block>

<audio>Welcome to the X Y Z corporation phone book.</audio> <goto next=”/servlet/Phonebook?mode=selectGroup”/>

</block>

After this introduction, the voice portal executes the other child goto element. This ele-ment instructs the voice portal to navigate to the next VoiceXML document in the service that may be loaded from the URL /servlet/Phonebook?mode=selectGroup.

The goto element causes the voice portal to send a request for a dynamically generated VoiceXML document from the multiclient architecture. The phonebook data responds to this request by generating the same XML response as in the case of the WML Web phone client, as shown in Listing 21.1 earlier in this chapter.

The phonebook view component identifies the client as a VoiceXML browser and loads the XSL style sheet shown in Listing 21.11.

LISTING 21.11 GetListOfContactGroups_VXML.xsl—The XSL Used by the Phonebook

View to Transform XML into a VoiceXML List of Contact Groups

<?xml version=”1.0” encoding=”UTF-8”?>

<xsl:stylesheet version=”1.0” xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:param name=”servlet” select=”’undefined’”/>

<xsl:template match=”/”>

<xsl:text disable-output-escaping=”yes”>

<![CDATA[<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language

➥ 1.0//EN” “http://resources.tellme.com/toolbox/vxml-tellme.dtd”>]]> </xsl:text>

<prompt>Please select the key on your phone with the initial of the ➥ last name of the person to call.</prompt>

<choice dtmf=”1”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=All</xsl:attribute> ➥ all</choice>

<choice dtmf=”2”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[A-C]</xsl:attribute> ➥ (a to c)</choice>

<choice dtmf=”3”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[D-F]</xsl:attribute> ➥ (d to f)</choice>

<choice dtmf=”4”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[G-I]</xsl:attribute> ➥ (g to i)</choice>

<choice dtmf=”5”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[J-L]</xsl:attribute> ➥ (j to l)</choice>

<choice dtmf=”6”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[M-O]</xsl:attribute> ➥ (m to o)</choice>

<choice dtmf=”7”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[P-S]</xsl:attribute> ➥ (p to s)</choice>

<choice dtmf=”8”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[T-V]</xsl:attribute> ➥ (t to v)</choice>

<choice dtmf=”9”><xsl:attribute name=”next”><xsl:value-of

➥ select=”$servlet”/>?mode=selectContact&group=[W-Z]</xsl:attribute> ➥ (w to z)</choice>

</catch>

</menu>

</vxml>

</xsl:template>

</xsl:stylesheet>

The VoiceXML that is generated by transforming the XML in Listing 21.1 using the XSL in Listing 21.11 appears in Listing 21.12.

LISTING 21.12 ListOfContactGroups.vxml—The VoiceXML Response for a List of Contact Groups

<?xml version=”1.0” encoding=”UTF-8”?>

<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language 1.0//EN”

➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”> <vxml application=”Phonebook.vxml”>

<prompt>Please select the key on your phone with the initial of the last ➥ name of the person to call.</prompt>

</catch>

</menu>

</vxml>

This VoiceXML is similar to the VoiceXML discussed for the previous step, with some differences discussed here. The vxml root element contains one child menu element that prompts the user for some input and then interprets the response. The id attribute of the menu element has the value SelectGroup:

VoiceXML documents may contain multiple menu elements. This id attribute may be used to navigate to menus either within the same VoiceXML document or in a different VoiceXML document.

The prompt child element contains text that is converted into speech by the text-to-speech output module of the voice portal; this indicates to the user what input is required to proceed to the next step in the dialog:

<prompt>Please select the key on your phone with the initial of the last name ➥ of the person to call.</prompt>

After the prompt element is a range of choice elements, each one representing a valid option in the user’s response to the previous prompt for input. In the following XSL snip-pet, the dtmf attribute of the choice element specifies that this option may be selected by pressing the touchtone key labeled “2” on the phone:

Alternatively, the text child of the choice element—in this case, with the value (a to c)—indicates that the user may say “a to c” to select this option. The next attribute of this element indicates the URL that the voice portal should navigate to when the user selects this option. In this case, the URL is the phonebook servlet with the HTTP GET argument mode with the value selectContact, indicating that the response should enable the user to select a particular contact from the contact group named [A-C], as specified by the other HTTP GET argument, named group.

The catch child element of the menu element indicates to the voice portal that certain events should be caught and handled as specified in the content of this element. The event attribute specifies that the events for nomatch, noinput, or help should be caught when the user provides invalid input, no input, or asks for “help,” respectively.

The reprompt child element of the catch element indicates to the voice portal that when any of these events are caught, the action taken should be to prompt the user for the input again, as described previously, and then wait for another input selection:

</catch>

In a production application, these events would typically be handled separately and in a more user friendly manner.

Selecting a Group to View a List of Its Contacts

Next, the user selects option number 2, corresponding to contacts with last names having initials in the range [A-C]. The XML generated by the phonebook data component in response to this request is the same as for the WML Web phone client (refer back to Listing 21.4).

The phonebook view uses the style sheet shown in Listing 21.13 to transform the results.

LISTING 21.13 GetListOfContacts_VXML.xsl—The XSL Used by the Phonebook View to Transform the XML into a VoiceXML List of Contacts in a Group

<?xml version=”1.0” encoding=”UTF-8”?>

<xsl:stylesheet version=”1.0” xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:param name=”servlet” select=”’undefined’”/>

<xsl:param name=”group” select=”’undefined’”/>

<xsl:param name=”lcletters” select=”’abcdefghijklmnopqrstuvwxyz’”/> <xsl:param name=”ucletters” select=”’ABCDEFGHIJKLMNOPQRSTUVWXYZ’”/> <xsl:template match=”/”>

<xsl:text disable-output-escaping=”yes”>

<![CDATA[<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language

➥ 1.0//EN” “http://resources.tellme.com/toolbox/vxml-tellme.dtd”>]]> </xsl:text>

➥ contacts. Please say the name of the contact you wish to call.</prompt>

<xsl:for-each select=”phonebook/contact”>

➥ ?mode=selectNumber&group=<xsl:value-of select=”$group”/>& ➥ contact=<xsl:value-of select=”@id”/></xsl:attribute>(

➥ <xsl:value-of select=”translate(name/firstname,$ucletters,

➥ $lcletters)”/><xsl:text> </xsl:text><xsl:value-of select=”translate( ➥ name/lastname,$ucletters,$lcletters)”/>)</choice>

</xsl:for-each>

</catch>

</menu>

</vxml>

</xsl:template>

</xsl:stylesheet>

The VoiceXML that results from this transformation is shown in Listing 21.14.

LISTING 21.14 ListOfContacts.vxml—The VoiceXML Response for a List of Contacts in a Group

<?xml version=”1.0” encoding=”UTF-8”?>

<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language 1.0//EN”

➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”> <vxml application=”Phonebook.vxml”>

<prompt>Got 2 contacts. Please say the name of the contact you wish to ➥ call.</prompt>

<choice next=”/servlet/Phonebook?mode=selectNumber&group=[A-C]& ➥ contact=e5678”>(joe ashworth)</choice>

<choice next=”/servlet/Phonebook?mode=selectNumber&group=[A-C]& ➥ contact=e9921”>(bill currie)</choice>

</catch>

</menu>

</vxml>

In this case DTMF options for the choice elements are not available. The effect of this is that the user is required to say the name of the contact to select it. It is possible to enable DTMF selection if the service requires it.

Each choice element’s next attribute points to the URL the voice portal should load and execute if the user selects it. The URL is composed of the location of the phonebook view component followed by mode, group, and contact arguments in HTTP GET syntax. The contact argument is assigned the value of the id attribute of the associated contact element in the source XML being transformed. This lowercase name that is the text child value of the choice element is effectively the grammar that indicates to the voice portal what the user will say to select this option.

The choice element illustrates a few characteristics of grammars:

VoiceXML grammars are required to be in lowercase.

The first and last names are separated by a space to indicate to the voice portal that the name is two words rather than one. This has bearing on the sounds the voice portal will expect when the user speaks this option.

The phrase is enclosed in parentheses to indicate to the voice portal that the user needs to speak the first name followed by the last name for this option to be selected.

Selecting a Contact to View the Details of That Contact

We will assume that the user has selected Bill Currie in the previous step. In this step, the user gets Bill Currie’s telephone numbers and selects one to call him. The XML gen-erated by the phonebook data component in response to this request is the same as in the case of the WML Web phone, as shown previously in Listing 21.7. The XSL used to transform this XML into VoiceXML is shown in Listing 21.15.

LISTING 21.15 GetContactDetails_VXML.xsl—The XSL Used by the Phonebook View to Transform the XML into VoiceXML Contact Details

<?xml version=”1.0” encoding=”UTF-8”?>

<xsl:stylesheet version=”1.0” xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:param name=”servlet” select=”’undefined’”/>

<xsl:param name=”group” select=”’undefined’”/>

<xsl:param name=”lcletters” select=”’abcdefghijklmnopqrstuvwxyz’”/> <xsl:param name=”ucletters” select=”’ABCDEFGHIJKLMNOPQRSTUVWXYZ’”/> <xsl:template match=”/”>

<xsl:text disable-output-escaping=”yes”>

<![CDATA[<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language

➥ 1.0//EN” “http://resources.tellme.com/toolbox/vxml-tellme.dtd”>]]> </xsl:text>

<prompt>There are <xsl:value-of select=”count(phone)”/> phone numbers ➥ for <xsl:value-of select=”name/firstname”/><xsl:text> </xsl:text>

➥ <xsl:value-of select=”name/lastname”/>. Please select from the ➥ following options:

<xsl:for-each select=”phone”>

<xsl:value-of select=”translate(@type,$ucletters,$lcletters)”/> ➥ <xsl:text> </xsl:text>

</xsl:for-each>

</prompt>

<xsl:for-each select=”phone”>

<choice><xsl:attribute name=”next”>#Call<xsl:value-of

➥ select=”@type”/></xsl:attribute><xsl:value-of

➥ select=”translate(@type,$ucletters,$lcletters)”/></choice>

</xsl:for-each>

</catch>

</menu>

<xsl:for-each select=”phone”> <form id=”CallHome”>

<xsl:attribute name=”id”>Call<xsl:value-of select=”@type”/> ➥ </xsl:attribute>

<block>

<audio>Transferring call to Bill Currie at the <xsl:value-of ➥ select=”@type”/> phone number.</audio>

</block>

<xsl:attribute name=”dest”><xsl:value-of select=”areacode”/> ➥ <xsl:value-of select=”number”/></xsl:attribute>

</transfer>

</form> </xsl:for-each>

</xsl:for-each> </vxml>

</xsl:template>

</xsl:stylesheet>

The VoiceXML that results from this transformation is shown in Listing 21.16.

LISTING 21.16 ContactDetails.vxml—The VoiceXML Response for Contact Details

<?xml version=”1.0” encoding=”UTF-8”?>

<!DOCTYPE vxml PUBLIC “-//Tellme Networks//Voice Markup Language 1.0//EN”

➥ ”http://resources.tellme.com/toolbox/vxml-tellme.dtd”> <vxml application=”Phonebook.vxml”>

<prompt>There are 2 phone numbers for Bill Currie. Please select from the ➥ following options: work mobile.</prompt>

<choice next=”#CallWork”>work</choice> <choice next=”#CallMobile”>mobile</choice> <catch event=”nomatch noinput help”>

</catch>

</menu>

<audio>Transferring call to Bill Currie at the Work phone number.</audio> </block>

<audio>Transferring call to Bill Currie at the Mobile phone ➥ number.</audio>

</block>

</vxml>

This VoiceXML has one menu element, which enables the user to select the phone num-ber to call, and two form elements—one for each phone number for the given contact. Each of the form elements serves to transfer the caller to the associated number. The val-ues of the next attributes of the choice elements in the menu element are local URLs, each one pointing to a form element in the same VoiceXML document with an id attribute that has a value the same as the part of the URL after the # character. Key ele-ments used in this form are the transfer elements that, when executed by the voice por-tal, cause it to transfer the caller to the given number. Here’s the code:

Selecting a Phone Number to Call That Contact

When the user hears the output of Listing 21.16, he says “work” in order to call Bill Currie at his work phone number. Control passes to the form in the same VoiceXML that has an id attribute with the value CallWork. The user is then notified that his call is being transferred. The call is then transferred to the number 813-236-7856. The user hears the call being transferred and then makes a connection with Bill Currie at his work number when he picks up the phone.

VoiceXML Structure and Elements

This section briefly reviews the key elements of VoiceXML, including all the elements used in the preceding example. For a complete detailed VoiceXML specification, see the VoiceXML Forum (www.voicexml.org).

Figure 21.9 shows a high-level graphical view of the main structure of a VoiceXML doc-ument. This view was derived from the VoiceXML 1.0 DTD (www.voicexml.org/ voicexml1-0.dtd).

Table 21.2 provides descriptions of the elements shown in this figure.

TABLE 21.2 VoiceXML Elements with Descriptions

Element : Description

assign : Assigns a value to a variable that exists in the state maintained by the voice portal for the caller’s session.

audio : Outputs some audio. The output can be either a prerecorded audio clip (for example, in the form of a WAV file) or in the form of synthesized speech generated from the text child of this element.

block : A container of procedural statements executed in sequence from first to last.

break : Inserts a pause in the speech output of a duration in milliseconds specified using an attribute.

catch : Catches an event either always or on some specified condition.

choice : Defines a menu item, including both the touchtone or speech input that may be used to select the choice and the URL to transfer control to upon selection of the choice.

clear : Resets one or more form item variables by setting their values to undefined.

disconnect : Disconnects a session, causing the voice portal to hang up the call from the user.

div : Specifies that the enclosed text is of a particular type (for example, a sentence or paragraph).

dtmf : Specifies a touchtone key grammar that serves as a set of valid phone key input options.

else : Used optionally in combination with if elements in conditional logic that may depend, for example, on the value of a variable.

elseif : Used optionally in combination with if elements in conditional logic that may depend, for example, on the value of a variable.

emp : Indicates that the enclosed text should be spoken with emphasis.

enumerate : Shorthand for automatically enumerating the choices available in a menu.

error : Catches an error event. Shorthand for a specific type of catch element that catches events of the error type.

exit : Exits a session by terminating all loaded VoiceXML documents and returning control to the interpreter.

field : Declares an input field in a form to get a user selection.

filled : An action executed when a user provides recognized input for a field.

form : A dialog for presenting information and collecting data from user input.

goto : Transfers execution to another form, dialog, or document.

grammar : Encloses a speech-recognition grammar that consists of a set of valid spoken inputs and the associated values that describe each option.

help : Catches a help event. Shorthand for a specific type of catch element that catches events of the help type.

if : Encloses conditional logic that may be executed, depending on the value of a variable, for example.

initial : Declares initial logic upon entry into a (mixed-initiative) form. In a mixedinitiative form, both the caller and the voice portal direct the conversation.

link : Specifies a transition common to all dialogs in the link’s scope.

menu : A dialog for prompting the user and enabling her to select from a range of choices.

meta : Enables specification of data about the document.

noinput : Catches a noinput event (an event that occurs when no response is received from the user when expected).

nomatch : Catches a nomatch event (an event that occurs when a response is received from the user but is not recognized as valid).

object : Invokes a platform-specific object with parameters (for example, a speaker verification object).

option : Specifies an option in a field, including the DTMF and/or speech required for the user to select the option as well as the value to assign to the field variable when the selection is made. Similar to the choice element for menus.

param : Used to specify name/value parameter pairs that are passed into object or subdialog.

prompt : Outputs synthesized speech or prerecorded audio to the user and then waits for a user response.

property : Sets the value of a property that controls the platform behavior (for example, timeouts).

pros : Specifies prosodic information about the enclosed text.

record : Records an audio sample and stores it in a field item variable.

reprompt : Plays a field prompt again (for example, when a field is revisited after a nomatch or noinput event).

return : Returns from a subdialog. This is similar in concept to a return from a function call in procedural logic.

sayas : Specifies how a word or phrase should be spoken. This enables finer control over the text-to-speech output.

script : Specifies a block of ECMAScript client-side scripting logic that will run on the voice portal.

subdialog : Invokes another dialog as a subdialog of the current one. This is similar in concept to a function call in procedural logic. It returns an ECMAScript object as the result of the subdialog.

submit : Submits values to the business Web site providing the voice Web applications.

throw : Throws an event that may be either a predefined event or an application- specific event.

transfer : Transfers the caller to another telephone number.

value : Inserts the value of an expression in a prompt (for example, a variable value).

var : Declares a variable and optionally assigns it a value.

vxml : The top-level root element in each VoiceXML document.

Development Primer

This primer provides important strategies for designing voice Web applications with VoiceXML.

Tips and Pitfalls for VoiceXML Development

The following subsections cover a few common tips and pitfalls concerning the develop-ment of VoiceXML services.

Usability Testing and Setting Expectations

Voice portals driven by VoiceXML represent a new paradigm in delivering Web applica-tions to users. In order to win user acceptance and be successful in meeting business needs, voice Web applications must be thoroughly tested not only for functionally but also from a usability standpoint. Engaging end users early and often for usability testing also helps set their expectations for the final service, thus easing their acceptance of the deployed result.

Voice Service Robustness

Voice Web service interfaces are limited to audio interaction with the user. The user’s ability to detect and correct problems with a voice Web service is therefore relatively limited when compared, for example, to a visual interface such as a Web browser.

Consequently, in order to ensure that voice Web applications are robust enough to meet the needs of mission-critical enterprise systems, they must be able to gracefully handle a range of exceptions and error conditions. This includes, in particular, missing or invalid user input, help requests from users, and various system errors (for example, problems with the network connectivity between the voice portal and the business Web site deliver-ing the voice Web applications).

Getting Started

A growing number of voice portals on the Web provide excellent services for developing, testing, and hosting deployed voice Web applications. These include but are not limited to the following:

Tellme Networks (www.tellme.com)

BeVocal (www.bevocal.com)

VoiceGenie (voicegenie.com)

Voxeo (www.voxeo.com)

Future VoiceXML Developments

Any system intended to live for more than a few years should be designed with sufficient flexibility to accommodate future changes. Although all changes cannot be anticipated, some can. These expected future developments should be used to stress-test any design to ensure that it can adapt to meet future changes.

Mainstream Use of Voice-Over-IP (VoIP)

In addition to telephones, Voice-over-IP (VoIP) can also be used to access voice Web applications via voice portals. In this case, the client side is a PC with speakers and a microphone, for example, and is connected to the voice portal over the Internet. As VoIP becomes a more popular method of communication, voice portals will seamlessly

adapt to this new method of accessing voice Web applications. From the multiclient architecture standpoint, there will be no apparent difference between telephone clients or Voice-over-IP clients, except perhaps in the lack of a caller’s phone number in the voice portal session.

Multimode Voice and Data Services

With new wireless networks that enable concurrent voice and data, new services will emerge that present hybrid voice and data interfaces× for example, interfaces that enable users to ask for directions and have the directions returned in a list that is cached on the client so that users can refer to it step by step. SMIL is a standard overseen by the W3C that’s an XML markup language that promises to coordinate such multimedia interfaces. SMIL may be easily generated from the multiclient XML/XSL-based architecture, enabling it to seamlessly adapt to deliver these new hybrid multimode services when they appear.

Advanced Voice Processing on the Client Side

As telephone and other types of clients gain more computational power, there will be a shift as more of the voice-processing capability goes to the client side. With this trend, we can expect to see such clients start accepting content that drives their voice capabili-ties, just as VoiceXML drives voice portals today. This is good in that it reduces the load on the voice portal while improving the client response time. More voice handling on the client side also enables greater client privacy for certain applications because the audio does not have to propagate over a network to be interpreted. This trend can already be seen in the new advanced voice command functionality that is appearing in some higher-end mobile phones as well as in navigation systems appearing in cars.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

XML and Web Services : Applied XML : Delivering Wireless and Voice Services with XML : Voice Applications with VoiceXML |