NETWORK MODELS
OF PERCEPTION
The last few sections have
created a new question for us. We’ve been focusing on the inter-pretive nature
of perception. In all cases, people don’t just “pick up” and record the stimuli
that reach the eye, the way a camera or videorecorder might. Instead, they
organize and shape the input. When they encounter ambiguity—and they often
do—they make choices about how the ambiguity should be resolved.
But how exactly does this
interpretation take place? This question needs to be pur-sued at two levels.
First, we can try to describe the sequence of events in functional terms—first,
this is analyzed; then that is analyzed—laying out in detail
the steps needed to accomplish the task. Second, we can specify the neural
processes that actu-ally support the analysis and carry out the processing.
Let’s look at both types of expla-nation, starting with the functional
approach.
Earlier, we noted some complications for any theorizing that involves features. Even with these complications, though, it’s clear that feature detection playsa central role in object recognition. We saw that the visual system does analyze the input in terms of features: Specialized cells—feature detectors—respond to lines at various angles, curves at various positions, and the like. More evidence for the importance of features comes from behavioral studies that use a visual search procedure. In this task, a research participant is shown an array of visual forms and asked to indicate as quickly as she can whether a particular target is present— whether a vertical line is visible among the forms shown, perhaps, or a red circle is visible amid a field of squares. This task is easy if the target can be distinguished from the field by just one salient feature—for example, searching for a vertical among a field of horizontals, or for a green target amidst a group of red distracters. In such cases, the target “pops out” from the distracter elements, and search time is virtually independent of the number of items in the display—so people can search through four targets, say, as fast as they can search through two, or eight as fast as they can search through four. These results make it clear that features have priority in our visual perception. We can detect them swiftly, easily, and presumably at an early stage in the sequence of events required to recognize an object.
But how do we use this feature
information? And how, knowing about the complications we’ve discussed, can we
use this information to build a full model of object recognition? One option is
to set up a hierarchy of detectors
with detectors in each layer serving as the triggers for detectors in the next
layer (Figure 5.14). In the figure, we’ve illustrated this idea with a
hierarchy for recognizing words; the
idea would be the same with one for recogniz-ing objects. At the lowest level of the hierarchy would be the feature
detectors we’ve already described—those responsive to horizontals, verticals,
and so forth. At the next level of the hierarchy would be detectors that
respond to combinations of these simple features. Detectors at this second
level would not have to survey the visual world directly. Instead, they’d be
triggered by activity at the initial level. Thus, there might be an “L”
detector in the second layer of detectors that fires only when triggered by
both the vertical- and horizon-tal-line detectors at the first level.
Hierarchical models like the one
just described are known as feature nets
because they involve a network of detectors that has feature detectors at its
bottom level. In the earliest feature nets proposed, activation flowed only
from the bottom up—from feature detectors to more complex detectors and so on
through a series of larger and larger units (see, for example, Selfridge,
1959). Said differently, the input pushes the process forward, and so we can
think of these processes as “data driven.” More recent models, however, have
also included a provision for “top-down” or “knowledge-driven”
processes—processes that are guided by the ideas and expectations that the
perceiver brings to the situation.
To see how top-down and bottom-up
processes interact, consider a problem in word recognition. Suppose you’re
shown a three-letter word in dim light. In this setting, your visual system
might register the fact that the word’s last two letters are AT; but at least initially, the system
has no information about the first letter. How, then, would you choose among MAT, CAT, and RAT? Suppose that, as part of the same experiment, you’ve just been
shown a series of words including several names of animals (dog,mouse, canary). This experience will
activate your detectors for these words, and the acti-vation is likely to
spread out to the memory neighbors of these detectors—including (probably) the
detectors for CAT and RAT. Activation of the CAT or RAT detector, in turn, will cause a top-down, knowledge-driven
activation of the detectors for the letters in these words, including C and R (Figure 5.15).
While all this is going on, the
data-driven analysis continues; by now, your visual sys-tem has likely detected
that the left edge of the target letter is curved (Figure 5.15B). This
bottom-up effect alone might not be enough to activate the detector for the
letter C; but notice that this
detector is also receiving some (top-down) stimulation (Figure 5.15A). As a
result, the C detector is now
receiving stimulation from two sources—from below (the feature detector) and
above (from CAT), and this
combination of inputs will probably be enough to activate the C detector. Then, once this detector is
activated, it will feed back to the CAT
detector, activating it still further. (For an example of models that work in
this way, see McClelland, Rumelhart, & Hinton, 1986; also Grainger, Rey,
& Dufau, 2008.)
It’s important, though, that we
can describe all of these steps in two different ways. If we look at the actual
mechanics of the process, we see that detectors are activating (or inhibiting)
other detectors; that’s the only thing going on here. At the same time, we can also
describe the process in broader terms: Basically, the initial activation of CAT functions as a knowledge-driven
“hypothesis” about the stimulus, and that hypothesis makes the visual system
more receptive to the relevant “data” coming from the feature detectors. In
this example, the arriving data confirm the hypothesis, thus leading to the
exclusion of alternative hypotheses.
With these points in view, let’s return to the question we asked earlier: We’ve discussed how the perceptual system interprets the input, and we’ve emphasized that the interpretation is guided by rules. But what processes in our mind actually do the “inter-preting”? We can now see that the interpretive process is carried out by a network of detectors, and the interpretive “rules” are built into the way the network functions. For example, how do we ensure that the perceptual interpretation is compatible with all the information in the input? This point is guaranteed by the fact that the feature detectors help shape the network’s output, and this simple fact makes it certain that the output will be constrained by information in the stimulus. How do we ensure that our perception contains no contradiction (e.g., perceiving a surface to be both opaque and transparent)? This is guaranteed by the fact that detectors within the network can inhibit other (incompatible) detectors. With mechanisms like these in place, the network’s output is sure to satisfy all of our rules—or at least to provide the best compromise possible among the various rules.
The network we’ve described so
far can easily recognize targets as simple as squares and circles, letters and
numerals. But what about the endless variety of three-dimensional objects that
surround us? For these, theorists believe we can still rely on a network of
detectors; but we need to add some intermediate levels of analysis.
A model proposed by Irving
Biederman, for example, relies on some 30 geomet-ric components that he calls geons (short for “geometric ions”).
These are three-dimensional figures such as cubes, cylinders, pyramids, and the
like; nearly all objects can be broken down perceptually into some number of
these geons. To recognize an object, therefore, we first identify its features
and then use these to identify the component geons and their relationships. We
then consult our visual memory to see if there’s an object that matches up with
what we’ve detected (Biederman, 1987; Figure 5.16).
In Biederman’s system, we might
describe a lamp, say, as being a certain geon (num-ber 4 in Figure 5.16) on top
of another (number 3). This combination of geons gives us a complete
description of the lamp’s geometry. But this isn’t the final step in object
recognition, because we still need to assign some meaning to this geometry. We
need to know that the shape is something we call a lamp—that it’s an object that casts light and can be switched on
and off.
As with most other aspects of
perception, those further steps usually seem effortless. We see a lamp (or a
chair, or a pickup truck) and immediately know what it is and what it is for.
But as easy as these steps seem, they’re far from trivial. Remarkably, we can
find cases in which the visual system successfully produces an accurate
structural descrip-tion but fails in these last steps of endowing the perceived
object with meaning. The cases involve patients who have suffered certain brain
lesions leading to visual agnosia (Farah, 1990). Patients with this disorder
can see, but they can’t recognize what they see . Some patients can perceive
objects well enough to draw recognizable pictures of them; but they’re unable
to identify either the objects or their own drawings. One patient, for example,
produced the drawings shown in Figure 5.17. When asked to say what he had
drawn, he couldn’t name the key and said the bird was a tree stump. He
evidently had formed adequate structural descriptions of these objects, but his
ability to process what he saw stopped there; his perceptions were stripped of
their meaning (Farah, 1990, 2004).
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.