Chapter: Internet & World Wide Web HOW TO PROGRAM - Multimedia: Audio, Video, Speech Synthesis and Recognition

| Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail |

Microsoft Agent Control

Microsoft Agent is an exciting technology for interactive animated characters in a Win-dows application or World Wide Web page.

Microsoft® Agent Control


Microsoft Agent is an exciting technology for interactive animated characters in a Win-dows application or World Wide Web page. The Microsoft Agent control provides access to Agent characters such as Peedy (a parrot), Genie, Merlin (a wizard) and Robby (a ro-bot)as well as those created by third-party developers. These Agent characters allow us-ers to interact with the application using more natural human communication techniques. The control accepts both mouse and keyboard interactions, speaks (if a compatible text-to-speech engine is installed) and also supports speech recognition (if a compatible speech recognition engine is installed). With these capabilities, Web pages can speak to users and actually respond to their voice commands. Users can create new characters with the help of the Microsoft Agent Character Editor and the Microsoft Linguistic Sound Editing Tool

(both downloadable from the Microsoft Agent Web site). In this section, we introduce the Microsoft Agent control.


The software for Microsoft Agent is on the CD-ROM that accompanies this book and may be downloaded from Microsoft’s Web site en-us/dnagent/html/agentdevdl.asp. This page also provides links to down-load the Lernout and Hauspie TruVoice text-to-speech (TTS) engine and the Microsoft Speech Recognition engine, ActiveX controls that power voice integration with Microsoft Agent. [Note: The Lernout and Hauspie TruVoice text-to-speech (TTS) engine is a 6 MB download. The download process may take some time from the Microsoft Web site. It is advisable to install this component directly from the CD-ROM included with this book.]


Figure 33.6 demonstrates the Microsoft Agent ActiveX control and the Lernout and Hauspie TruVoice text-to-speech engine (also an ActiveX control). This XHTML docu-ment embeds each of these ActiveX controls into a Web page that acts as a tutorial for the various types of programming tips presented in this text. Peedy the Parrot displays and speaks text that describes each of the programming tips. When the user clicks the icon for a programming tip, Peedy jumps to that tip and recites the appropriate text.


To run this example, install the Microsoft Agent character Peedy from the accompa-nying CD. Locate the Peedy.acs file on your computer, and change




to reflect the physical path to the file on your computer. [Note: make sure all backslashes are preceded by a second backslash.] If you would like to run this example from the Inter-net, change









The first screen capture illustrates Peedy finishing his introduction. The second screen capture shows Peedy jumping toward the Common Programming Error icon. The last screen capture shows Peedy finishing his discussion of Common Programming Errors.

Before using Microsoft Agent or the Lernout and Hauspie TruVoice TTS engine in the Web page, both must load into the Web page via object elements. Lines 13–16 embed an instance of the Microsoft Agent ActiveX control into the Web page and give it the scripting name agent via the id property. Similarly, lines 19–22 embed an instance of the Lernout and Hauspie TruVoice TTS engine into the Web page. This object is not scripted directly by the Web page. The Microsoft Agent uses the TTS engine control to speak the text that Microsoft Agent displays. If either of these controls is not already installed on the computer browsing the Web page, the browser attempts to download that control from the Microsoft Web site. The codebase attribute (lines 15 and 21) specifies the URL from which to download this version of the software (Version 2 for the Microsoft Agent control and Version 6 for the Lernout and Hauspie TruVoice TTS engine). The Microsoft Agent documentation discusses how to place these controls on a server for clients to download. [Note: Placing these controls on your own server requires a license from Microsoft.]


The body of the document (lines 198–250) defines a table containing the seven pro-gramming tip icons. Each tip icon is given a scripting name via its img element’s name property. The scripting name changes the background color of the img element when users click it to receive an explanation of that tip type. Each img element’s onclick event is registered as function imageSelectTip, defined at line 138. Each img element passes itself (i.e., this) to function imageSelectTip so the function can determine the par-ticular user-selected image.


The XHTML document contains four separate script elements. The script ele-ment at lines 30–168 defines global variables used in all the script elements and defines functions loadAgent (called in response to the body element’s onload event), imag-eSelectTip (called when users click an img element) and tellMeAboutIt (called by imageSelectTip to speak a few sentences about a tip).


Function loadAgent is particularly important because it loads the Microsoft Agent character that is used in this example. Lines 97–98 use the Microsoft Agent control’s Characters collection to load the character information for Peedy. Method Load of the Characters collection takes two arguments. The first argument specifies a name for the character that can be used later to interact with that character, and the second argument specifies the location of the character’s data file (Peedy.acs in this example).


            <?xml version = "1.0"?>

            <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"



            <!-- Fig. 33.6: tutorial.html -->

            <!-- Microsoft Agent Control  -->


            <html xmlns = "">


            <title>Speech Recognition</title>


            <!-- Microsoft Agent ActiveX Control -->

            <object id = "agent" width = "0" height = "0"

            classid = "CLSID:D45FD31B-5C6E-11D1-9EC1-00C04FD7081F"

            codebase = "#VERSION = 2, 0, 0, 0">



            <!-- Lernout & Hauspie TruVoice text to speech engine -->

            <object width = "0" height = "0"

            classid = "CLSID:B8F2846E-CE36-11D0-AC83-00C04FD97575"

            codebase = "#VERSION = 6, 0, 0, 0">



            <!-- Microsoft Speech Recognition Engine -->

            <object width = "0" height = "0"

            classid = "CLSID:161FA781-A52C-11d0-8D7C-00A0C9034A7E"

            codebase = "#VERSION = 4, 0, 0, 0">



            <script type = "text/javascript">



            var currentImage = null;

            var tips =

            [ "gpp", "seo", "perf", "port",

            "gui", "dbt", "cpe" ];

            var tipNames = [

            "Good Programming Practice",

            "Software Engineering Observation",

            "Performance Tip", "Portability Tip",

"Look-and-Feel Observation",           

"Testing and Debugging Tip",           

"Common Programming Error" ];      

var voiceTips = [        

"Good [Programming Practice]",       

"Software [Engineering Observation]",         

"Performance [Tip]",  

"Portability [Tip]",      

"Look-and-Feel [Observation]",        

"Testing [and Debugging Tip]",         

"Common [Programming Error]" ];    

var explanations = [    

// Good Programming Practice text   

"Good Programming Practices highlight " +  

"techniques for writing programs that are " +

"clearer, more understandable, more " +        

"debuggable, and more maintainable.",         


// Software Engineering Observation text     

"Software Engineering Observations highlight " +    

"architectural and design issues that affect " +          

"the construction of complex software systems.",     


// Performance Tip text          

"Performance Tips highlight opportunities for " +

"improving program performance.",   


// Portability Tip text  

"Portability Tips help students write portable " +

"code that can execute in different Web browsers.",


// Look-and-Feel Observation text     

"Look-and-Feel Observations highlight graphical " +

"user interface conventions. These observations " +

"help students design their own graphical user " +

"interfaces in conformance with industry " +



// Testing and Debugging Tip text     

"Testing and Debugging Tips tell people how to " +

"test and debug their programs. Many of the " +      

"tips also describe aspects of creating Web " +         

"pages and scripts that reduce the likelihood " +

"of 'bugs' and thus simplify the testing and " +         

"debugging process.",


// Common Programming Error text  

"Common Programming Errors focus the students' " +

"attention on errors commonly made by beginning " +

"programmers. This helps students avoid making " +

"the same errors. It also helps reduce the long " +

"lines outside instructors' offices during " +  

"office hours!" ];


function loadAgent()


agent.Characters.Load( "Peedy",

"C:\\WINNT\\msagent\\chars\\Peedy.acs" );

actor = agent.Characters.Character( "Peedy" );

actor.LanguageID = 0x0409; // sometimes needed


// get states from server

actor.Get( "state", "Showing" );

actor.Get( "state", "Speaking" );

actor.Get( "state", "Hiding" );


// get Greet animation and do Peedy introduction

actor.Get( "animation", "Greet" );

actor.MoveTo( screenLeft, screenTop - 100 );


actor.Play( "Greet" );

actor.Speak( "Hello. " +

"If you would like me to tell you about a " +

"programming tip, click its icon, or, press " +

"the 'Scroll Lock' key, and speak the name " +

"of the tip, into your microphone." );


// get other animations

actor.Get( "animation", "Idling" );

actor.Get( "animation", "MoveDown" );

actor.Get( "animation", "MoveUp" );

actor.Get( "animation", "MoveLeft" );

actor.Get( "animation", "MoveRight" );

actor.Get( "animation", "GetAttention" );

actor.Get( "animation", "GetAttentionReturn" );


// set up voice commands

for ( var i = 0; i < tips.length; ++i )

actor.Commands.Add( tips[ i ],

tipNames[ i ], voiceTips[ i ], true, true );


actor.Commands.Caption = "Programming Tips";

actor.Commands.Voice = "Programming Tips";

            actor.Commands.Visible = true;



            function imageSelectTip( tip )


            for (     var i = 0; i < document.images.length; ++i )

            if          ( document.images( i ) == tip )

                        tellMeAboutIt( i );



            function voiceSelectTip( cmd )


            var found = false;


            for ( var i = 0; i < tips.length; ++i )

            if ( cmd.Name == tips[ i ] ) {

            found = true;




            if ( found )

            tellMeAboutIt( i );



            function tellMeAboutIt( element )


            currentImage = document.images( element );

   = "red";



            currentImage.offsetParent.offsetTop + 30 );

            actor.Speak( explanations[ element ] );


            // -->




            <script type = "text/javascript" for = "agent"

            event = "Command( cmd )">


            voiceSelectTip( cmd );

            // -->



            <script type = "text/javascript" for = "agent"

            event = "BalloonHide">


            if ( currentImage != null ) {

   = "lemonchiffon";

            currentImage = null;


            // -->



            <script type = "text/javascript" for = "agent"

            event = "Click">


            actor.Play( "GetAttention" );

            actor.Speak( "Stop poking me with that pointer!" );

            actor.Play( "GetAttentionReturn" );

            // -->




            <body style = "background-color: lemonchiffon"

            onload = "loadAgent()">

            <table border = "0">


            <th colspan = "4">

            <h1 style = "color: blue">

            Deitel Programming Tips





            <td align =      "center" valign = "top" width = "120">

            <img id = "gpp" src = "GPP_100h.gif"

            alt        = "Good Programming Practice" border =

            "0" onclick = "imageSelectTip( this )" />

            <br />Good Programming Practices</td>

            <td align = "center" valign = "top" width = "120">

            <img id = "seo" src = "SEO_100h.gif"

            alt = "Software Engineering Observation"

            border = "0"

            onclick = "imageSelectTip( this )" />

            <br />Software Engineering Observations</td>

            <td align = "center" valign = "top" width = "120">

            <img id = "perf" src = "PERF_100h.gif"

            alt = "Performance Tip" border = "0"

            onclick = "imageSelectTip( this )" />

            <br />Performance Tips</td>

            <td align = "center" valign = "top" width = "120">

            <img id = "port" src = "PORT_100h.gif"

            alt = "Portability Tip" border = "0"

            onclick = "imageSelectTip( this )" />

            <br />Portability Tips</td>



            <td align =      "center" valign = "top" width =          "120">

            <img id = "gui" src = "GUI_100h.gif"           

            alt        = "Look-and-Feel Observation" border =

            "0" onclick = "imageSelectTip( this    )" />

<br />Look-and-Feel Observations</td>       

<td align = "center" valign = "top" width =    "120">

            <img id = "dbt" src = "DBT_100h.gif"          

            alt        = "Testing and Debugging Tip" border =

            "0" onclick = "imageSelectTip( this    )" />

            <br />Testing and Debugging Tips</td>       

            <td align = "center" valign = "top" width =    "120">

            <img id = "cpe" src = "CPE_100h.gif"          

            alt = "Common Programming Error" border =

            "0" onclick = "imageSelectTip( this    )" />

            <br />Common Programming Errors</td>    



            <img src = "agent_button.gif" style = "position: absolute;

            bottom: 10px; right: 10px" />





Fig. 33.6 Demonstrating Microsoft Agent and the Lernout and Hauspie TruVoice text-to-speech (TTS) engine


Line 99 assigns to global variable actor a reference to the Peedy Character object. Object Character of the Characters collection receives as its argument the name that was used to download the character data in lines 97–98. Line 100 sets the Char-acter’s LanguageID property to 0x0409 (English). Microsoft Agent can actually be used with several different languages (see the documentation for more information).


Lines 103–105 use the Character object’s Get method to download the Showing, Speaking and Hiding states for the character. The method takes two arguments—the type of information to download (in this case, state information) and the name of the corre-sponding element (e.g., Showing). Each state has animations associated with it. When the character is displayed (i.e., the Showing state), its associated animation plays (Peedy flies onto the screen). Downloading the Speaking state provides a default animation that makes the character appear to be speaking. When the character hides (i.e., goes into the Hiding state), the animations that make the character disappear are played (Peedy flies away).


Line 108 calls Character method Get to load an animation (Greet, in this example). Lines 109–116 use a variety of Character methods to interact with Peedy. Line 109 invokes the MoveTo method to specify Peedy’s position on the screen. Line 110 calls method Show to display the character. When this occurs, the character plays the ani-mation assigned to the Showing state (Peedy flies onto the screen). Line 111 calls method Play to play the Greet animation (see the first screen capture). Lines 112–116 invoke method Speak to make the character speak its string argument. If there is a compatible TTS engine installed, the character displays a bubble containing the text and speaks the text as well. The Microsoft Agent Web site contains complete lists of animations available for each character (some are standard to all characters, others are specific to each character).


Lines 119–125 load several other animations. Line 119 loads the set of Idling ani-mations that Microsoft Agent uses when users are not interacting with the character. When running this example, be sure to leave Peedy alone for a while to see some of these anima-tions. Lines 120–123 load the animations for moving the character up, down, left and right (MoveUp, MoveDown, MoveLeft and MoveRight, respectively).


Clicking an image calls function imageSelectTip (lines 137–142). The method first uses Character method Stop to terminate the current animation. Next, the for structure at lines 139–141 determines which image the user clicked. The condition in line 140 calls the document object’s images collection which determines the index of the clicked img element. If the tip number is equal to the image number (docu-ment.images( i )), then function tellMeAboutIt (lines 158–166) is called, where i is passed as the argument.


Line 160 of function tellMeAboutIt assigns global variable currentImage a reference to the clicked img element. This function changes the background color of the img element that the user clicked by highlighting that image on the screen. Line 161 changes the background color of the image to red. Line 162 invokes Character method MoveTo to position Peedy above the clicked image. When this statement executes, Peedy flies to the image. The currentImage’s offsetParent property determines the parent element that contains the image (in this example, the table cell in which the image appears). The offsetLeft and offsetTop properties of the table cell determine the location of the cell with respect to the upper left corner of the browser window. The Char-acter object’s Speak method (Line 165) speaks the text that is stored as strings in the array explanations for the selected tip.

Lines 177–188 invoke the script for the agent control in response to the hiding of the text balloon. If the currentImage is not null, the background color of the image is changed to lemonchiffon (the document’s background color) and variable curren-tImage is reset to null.


The script for the agent control at lines 187–194 is invoked in response to the user’s clicking the character. When this occurs, line 190 plays the GetAttention animation, line 191 causes Peedy to speak the text “Stop poking me with that pointer!” and line 192 plays the last frame of the GetAttention animation by specifying GetAt-tentionReturn.


Microsoft provides complete lists of animations as well as recommended standard ani-mation sets for their Agent characters at



Voice recognition is also included in this example to enable the Agent character to receive voice commands. The first screen capture illustrates Peedy finishing his introduc-tion (Fig. 33.7). The second screen capture shows Peedy after the user presses the Scroll Lock key to start issuing voice commands, which initializes the voice-recognition engine (Fig. 33.8). The third screen capture (Fig. 33.9) shows Peedy after receiving a voice com-mand (i.e., “Good Programming Practice”, which causes a Command event for the agent control). The last screen capture shows Peedy discussing Good Programming Practices (Fig. 33.10).


To enable Microsoft Agent to recognize voice commands, a compatible voice-recog-nition engine must be installed. Lines 25–28 use an object element to embed an instance of the Microsoft Speech Recognition engine control in the Web page.


Next, the voice commands used to interact with the Peedy must be registered in the Character object’s Commands collection. The for structure at lines 128–130 uses the Commands collection’s Add method to register each voice command. The method receives five arguments. The first argument is a string representing the command name (typically used in scripts that respond to voice commands). The second argument is a string that appears in a pop-up menu in response to a right-click on the character. The third argu-ment is a string representing the words or phrase users can speak for this command (stored in array voiceTips at lines 44–51). Optional words or phrases are enclosed in square brackets ([]). The last two arguments are boolean values indicating whether the command is currently enabled (i.e., whether users can speak the command) and whether the command is currently visible in the pop-up menu and Voice Commands window for the character.


Lines 132–134 set the Caption, Voice and Visible properties of the Commands object. The Caption property specifies text that describes the voice command set. This text appears in the small rectangular area that appears below the character when users press the Scroll Lock key. The Voice property is similar to the Caption property except that the specified text appears in the Voice Commands window with the set of voice com-mands the user can speak below it. The Visible property is a boolean value that specifies whether the commands of this Commands object should appear in the pop-up menu.


After receiving a voice command, the agent control’s Command event handler (lines 170–175) executes. This script calls function voiceSelectTip and passes it the name of the received command. Function voiceSelectTip (lines 144–156) uses the name of the command in the for structure (lines 148–152) to determine the index of the command in the Commands object. This value is then passed to function tellMeAboutIt (line 158), which causes Peedy to move to the specified tip and discuss that tip.


This example has covered only the basic features and functionality of Microsoft Agent. Many more features are available. Figure 33.11 lists several other Microsoft Agent events.


Figure 33.12 shows some other properties and methods of the Character object. Remember that the Character object represents the character that is displayed on the screen and enables interaction with that character. For a complete listing of properties and methods, see the Microsoft Agent Web site


Figure 33.13 shows some speech output tags that can customize speech output proper-ties. The animated character will speak these tags inserted into the text string. Speech output tags generally remain in effect from the time at which they are encountered until the end of the current Speak method call.


Event   Description


BalloonHide    Called when the text balloon for a character is hidden.

BalloonShow  Called when the text balloon for a character is shown.

Hide    Called when a character is hidden.

Move   Called when a character is moved on the screen.

Show   Called when a character is displayed on the screen.

Size     Called when a character’s size is changed.

Fig. 33.11  Other events for the Microsoft Agent control.



Property or method                 Description     



Height             The height of the character in pixels. 

Left                 The left edge of the character in pixels from the left of the screen.

Name               The default name for the character.   

Speed              The speed of the character’s speech. 

Top                  The top edge of the character in pixels from the top of the screen.

Width              The width of the character in pixels. 


Activate                      Sets the currently active character when multiple characters appear on the screen.  

GestureAt                   Specifies that the character should gesture toward a location on the screen that is specified in pixel coordinates from the upper left corner of the screen.           

Interrupt                      Interrupts the current animation. The next animation in the queue of animations for this character is then displayed.    

StopAll                        Stops all animations of a specified type for the character.    

Fig. 33.12  Other properties and methods for the Character object.



Tag     Description


\Chr=string\     Specifies the tone of the voice. Possible values for string are Normal (the default) for a normal tone of voice, Monotone for a monotone voice or Whisper for a whispered voice.

\Emp\   Emphasizes the next spoken word.

\Lst\     Repeats the last statement spoken by the character. This tag must be the only content of the string in the Speak method call.

\Pau=number\  Pauses speech for number milliseconds.

\Pit=number\   Changes the pitch of the character’s voice. This value must be within the range 50 to 400 hertz for the Microsoft Agent speech engine.

\Spd=number\ Changes the speech speed to a value in the range 50 to 250.

\Vol=number\  Changes the volume to a value in the range 0 (silent) to 65,535 (maximum volume).


Fig. 33.13  Speech output tags.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Copyright © 2018-2020; All Rights Reserved. Developed by Therithal info, Chennai.