Audio Processing:
The two
principal human senses are vision and hearing. Correspondingly,much of DSP is
related to image and audio processing. People listen toboth music and speech. DSP has made revolutionary changes in both these areas.
1. Music Sound processing:
The path
leading from the musician's microphone to the audiophile's speaker is
remarkably long. Digital data representation is important to prevent the
degradation commonly associated with analog storage and manipulation. This is
very familiar to anyone who has compared the musical quality of cassette tapes
with compact disks. In a typical scenario, a musical piece is recorded in a
sound studio on multiple channels or tracks. In some cases, this even involves
recording individual instruments and singers separately. This is done to give
the sound engineer greater flexibility in creating the final product. The
complex process of combining the individual tracks into a final product is called
mix down. DSP can provide several
important functions during mix down, including: filtering, signal addition and
subtraction, signal editing, etc. One of the most interesting DSP applications
in music preparation is artificial
reverberation. If the individual channels are simply added together, the
resulting piece sounds frail and
diluted, much as if the musicians were playing outdoors. This is because
listeners are greatly influenced by the echo or reverberation content of the
music, which is usually minimized in the sound studio. DSP allows artificial
echoes and reverberation to be added during mix down to simulate various ideal
listening environments. Echoes with delays of a few hundred milliseconds give
the impression of cathedral likelocations. Adding echoes with delays of 10-20
milliseconds provide the perception of more modest size listening rooms.
2. Speech generation:
Speech
generation and recognition are used to communicate between humans and machines.
Rather than using your hands and eyes, you use your mouth and ears. This is
very convenient when your hands and eyes should be doing something else, such
as: driving a car, performing surgery, or (unfortunately) firing your weapons
at the enemy. Two approaches are used for computer generated speech: digital recording and vocal tract
simulation. In digital recording, the voice of a human speaker is digitized and stored, usually in a
compressed form. During playback, the stored data are uncompressed and
converted back into an analog signal. An entire hour of recorded speech
requires only about three me gabytes of storage, well within the capabilities
of even small computer systems. This is the most common method of digital
speech generation used today. Vocal tract simulators are more complicated, trying
to mimic the physical mechanisms by which humans create speech. The human vocal
tract is an acoustic cavity with resonate frequencies determined by the size
and shape of the chambers. Sound originates in the vocal tract in one of two
basic ways, called voiced and fricative sounds. With voiced sounds,
vocal cord vibration produces near periodic pulses of air into the vocal
cavities. In comparison, fricative sounds originate from the noisy air
turbulence at narrow constrictions, such as the teeth and lips. Vocal tract
simulators operate by generating digital signals that resemble these two types
of excitation. The characteristics of the resonate chamber are simulated by
passing the excitation signal through a digital filter with similar resonances.
This approach was used in one of the very early DSP success stories, the Speak & Spell, a widely sold
electronic learning aid for children.
3. Speech recognition:
The
automated recognition of human speech is immensely more difficult than speech
generation. Speech recognition is a classic example of things that the human
brain does well, but digital computers do poorly. Digital computers can store
and recall vast amounts of data, perform mathematical calculations at blazing
speeds, and do repetitive tasks without becoming bored or inefficient.
Unfortunately, present day computers perform very poorly when faced with raw
sensory data. Teaching a computer to send you a monthly electric bill is easy.
Teaching the same computer to understand your voice is a major undertaking.
Digital Signal Processing generally approaches the problem of voice recognition
in two steps: feature extraction
followed by feature matching. Each
word in the incoming audio signal is isolated and then analyzed to identify the
type of excitation and resonate frequencies. These parameters are then compared
with previous examples of spoken words to identify the closest match. Often,
these systems are limited to only a few hundred words; can only accept speech
with distinct pauses between words; and must be retrained for each individual
speaker. While this is adequate for many commercialapplications, these
limitations are humbling when compared to the abilities of human hearing. There
is a great deal of work to be done in this area, with tremendous financial
rewards for those that produce successful commercial products.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.