Chapter: Psychology: Research Methods

Establishing Cause and Effect: the Power of Experiments

Experimental Groups versus Control Groups • Random Assignment • WithinSubject Comparisons • Internal Validity • Beyond the Single Experiment

ESTABLISHING CAUSE AND EFFECT: THE POWER OF EXPERIMENTS

In ordinary conversation, people use the word experiment when referring to almost any sort of test. (“Would a bit of oregano make this stew taste better? Let’s experiment and find out!”) In the sciences, though, experiment has a more specific meaning: It is a test in which the investigators manipulate some variable in order to set up a specific comparison. Let’s look at what this means and why it’s so important—including why experiments allow us to make cause-and-effect claims.

Experimental Groups versus Control Groups

In an observational study, the researcher simply records what she finds in the world. In a scientific experiment, in contrast, the researcher deliberately changes something. She might change the nature of the test being given, or the circumstances, or the instruc-tions. This change is usually referred to as the experimental manipulation—and the point of an experiment is to ask what results from this change. To see how this plays out, let’s consider a new example.

Many companies sell audio recordings that contain subliminal messages embedded in background music. The message might be an instruction to give up smoking or curb overeating, or it might be designed to build self-esteem or overcome shyness. The mes-sage is played so softly that you can’t consciously detect it when listening to the record-ing; still, it’s alleged to provide important benefits.

Anecdotal evidence—reports from various people announcing, “Hey, I tried the tapes, and they really worked for me!”—sometimes suggests that these subliminal mes-sages can be quite effective. However, we’ve already discussed the problems with relying on such anecdotes; and so, if we want a persuasive test of these messages, it would be best to set up an experiment. Our experimental manipulation would be the presenta-tion of the subliminal message, and this would define our study’s independent variable: message presented versus message not presented.

What about the dependent variable? Suppose we’re testing a tape advertised as helping people give up cigarette smoking. In that case, our dependent variable might be the num-ber of cigarettes smoked in, say, the 24-hour period after hearing the tape. In our study, we might ask 20 students—all longtime smokers—to listen to the tape; then we’d count up how many cigarettes they each consume in the next 24 hours. However, this procedure by itself tells us nothing. If the students smoke an average of 18 cigarettes in the 24-hour test period, is that less than they would have smoked without the tape? We have no way to tell from the procedure described so far, and so there’s no way to interpret the result.

What’s missing is a basis for comparison. One way to arrange for this is to use two groups of participants. The experimental group will experience the experimental manipulation—their tape contains the subliminal message. The control group will not experience the manipulation. So, by comparing the control group’s cigarette consump-tion to that of the experimental group, we can assess the message’s effectiveness.

But exactly what procedure should we use for the control group? One possibility is for these participants to hear no recording at all, while those in the experimental group hear the tape containing the subliminal message embedded in music. This setup, how-ever, once again creates problems: If we detect a contrast between the two groups, then the subliminal message might be having the predicted effect. But, on the other hand, notice that the subliminal message is embedded in music—so is the experimental group being influenced by the music rather than the message? (Perhaps the partici-pants find it relaxing to listen to music and then smoke less because they’re more relaxed.) In this case, it helps to listen to the recording; but the result would be the same if there had been no subliminal message at all.

To avoid this ambiguity, the procedures used for the control group and the experi-mental group must match in every way except for the experimental manipulation. If the experimental group hears music containing the subliminal message, the control group must hear the identical music without any subliminal message. If the procedure for the experimental group requires roughly 30 minutes, then the procedure for the control participants should take 30 minutes. It’s also important for the investigators to treat the two groups in precisely the same way. If we tell members of the experimental group they’re participating in an activity that might help them smoke less, then we must tell members of the control group the same thing. That way, the two groups will have sim-ilar expectations about the procedure.

Random Assignment

As we have just described, it’s crucial for the experimental and control group procedures to be as similar as possible—differing only in the experimental manipulation itself. It’s also essential for the two groups of participants to start out the procedure being well matched to each other. In other words, there should be no systematic differences between the experimental and control groups when the experiment begins. Then, if the two groups differ at the end of the experiment, we can be confident that the difference was created during the experiment—which, of course, is what we want.

How can we achieve this goal? The answer is random assignment—the process of using some random device, like a coin toss, to decide which group each participant goes into. According to some descriptions, this is the defining element of a true exper-iment. Random assignment is based on the simple idea that people differ from each other. Some people are anxious and some are not; some like to race through tasks while others take their time; some pay attention well and others are easily distracted. There’s no way to get around these differences—but with random assignment, we can be confident that some of the anxious people will end up in the experimental group and some in the control group; some of the attentive people will end up in one group and some in the other. Random assignment doesn’t change the fact that participants differ from one to the next, but this procedure makes it very likely that the mix of par-ticipants in one group will be the same as the mix in the other group. As a result, the groups are matched overall at the start of our experiment—and that’s exactly what we want.

Notice that we’ve now solved the concerns about cause and effect. Thanks to random assignment, we know that the groups started out matched to each other before we introduced the experimental manipulation. Therefore, any differences we observe in the dependent variable weren’t there before the manipulation, and so they must have arisen after the manipulation. As we mentioned earlier, this is just the information we need inorder to determine which variable is the cause and which is the effect.

Random assignment also removes the third-variable problem. The issue there was that the groups being compared might differ in some regard not covered by the variables being scrutinized in our study. Thus, students who take Latin in high school might also be more motivated academically, and the motivation (not the Latin) might be why these students do especially well in college.

This problem wouldn’t arise, however, if we could use random assignment to decide who takes Latin classes and who doesn’t. Doing so wouldn’t change the fact that some students are more motivated and others are less so; but it would guaran-tee that the Latin takers included a mix of motivated and less motivated students, and likewise for the group that does not take Latin. That way, the groups would be matched at the start—so if they end up being different later on, it must be because of the Latin itself.

Within-Subject Comparisons

Random assignment thus plays a central role in justifying our cause-and-effect claims. But the psychologist’s tool kit includes another technique for ensuring that the experi-mental and control groups match each other at the start of the experiment. This tech-nique involves using the same people for the two groups, guaranteeing that the two “groups” are identical in their attitudes, backgrounds, motivations, and so forth. An experiment that uses this technique of comparing participants’ behavior in one setting to the same participants’ behavior in another setting is said to use within-subject com-parisons. This kind of experiment differs from the other designs we’ve considered sofar, which use between-subject comparisons.

Within-subject comparisons are advantageous because they eliminate any question about whether the experimental and control groups are fully matched to each other. But within-subject comparisons introduce their own complications. For example, let’s say that participants are first tested in the proper circumstances for the control condition and then tested in the circumstances for the experimental condition. In this case, if we find a differ-ence between the conditions, is it because of the experimental manipulation? Or is it because the experimental condition came second, when participants were more comfort-able in the laboratory situation or more familiar with the experiment’s requirements?

Fortunately, we can choose from several techniques for removing this sort of concern from a within-subjects design. In the example just sketched, we could run the control condition first for half of the participants and the experimental condition first for the other half. That way, any effects of sequence would have the same impact on both the experimental and control data, so any effects of sequence could not influence the com-parison between the conditions. Techniques like this enable psychologists to rely on within-subject designs and can remove any question about whether the participants in the two conditions are truly comparable to each other.

Internal Validity

You may have detected a theme running through the last few sections: Over and over, we’ve noted that a particular procedure or a particular comparison might yield data that are open to more than one interpretation. Over and over, therefore, we’ve adjusted the procedure or added a precaution to avoid this sort of ambiguity. That way, when we get our result, we won’t be stuck in the position of saying that maybe this caused the result or maybe that caused the result. In other words, we want to set up the experiment from the start so that, if we observe an effect, there’s just one way to explain it. Only in that situation can we draw conclusions about the impact of our independent variable.

How have we achieved the goal? The various steps we’ve discussed all serve to isolate the experimental manipulation—so it’s the only thing that differentiates the two groups, or the two conditions, we are comparing. With random assignment, we ensure that the groups were identical (or close to it) at the start of the experiment. By properly designing our control procedure, we ensure that just one factor within the experiment distinguishes the groups, Then, if the two groups differ at the end of the study, we know that just one factor could have produced this difference—and that’s what allows us to make the strong claim that the factor we manipulated did, indeed, cause the difference we observed.

These various steps (random assignment, matching of procedures, and so on) are all aimed at ensuring that an experiment has internal validity—it has the properties that will allow us to conclude that the manipulation of the independent variable was truly the cause of the observed change in the dependent variable. If an experiment lacks internal validity, it will not support the cause-and-effect claims that our science needs.

Beyond the Single Experiment

So far, we’ve considered the many elements needed for a proper experiment. But we should also realize that the scientific process doesn’t end once a single experiment or observational study is finished (Figure 1.16). As one further step, the research must be evaluated by experts in the field to make certain it was done properly. This step is usu-ally achieved during the process of publishing the study in one of the scientific jour-nals. Specifically, a paper is published only after being evaluated and approved by other researchers who are experts in that area of investigation. These other researchers pro-vide peer review (i.e., the paper’s authors and these evaluators are all “peers” within the scientific community), and they must be convinced that the procedure was set up cor-rectly, the results were analyzed appropriately, and the conclusions are justified by the data. It’s only at this point that the study will be taken seriously by other psychologists.

Even after it’s published, a scientific study continues to be scrutinized. Other researchers will likely try to replicate the study—to run the same procedure with a new group of participants and see if it yields the same results. A successful replication assures us that there was nothing peculiar about the initial study and that the study’s results are reliable. Other investigators may also run alternative experiments in an attempt to challenge the initial findings.

This combination of replications and challenges eventually produces an accumulation of results bearing on a question. Researchers then try to assemble all the evidence into a sin-gle package, to check on how robust the results are—that is, whether the results are consis-tent even if various details in the procedure (the specific participants, the particular stimuli) are changed. Sometimes, this pooling of information is done in a published article—called a literature review—that describes the various results and discusses how they are or are not consistent with each other. In addition, researchers often turn to a statistical technique called meta-analysis. This is a formal procedure for mathematically combining the results of numerous studies—so, in effect, it’s an analysis of the individual analyses contained within each study. Meta-analysis allows inves-tigators to assess the consistency of a result in quantitative terms.

It’s only after all these steps—the result has been replicated, has survived scrutiny and challenge, and has been corroborated through other studies brought together in a review or meta-analysis—that we can truly consider the original results persuasive and the conclu-sions justified. Now we can say that the original hypothesis is confirmed—that is, well sup-ported by evidence. Notice, however, that even after all these steps, we do not claim the hypothesis is proven. That’s because scientists, in an open-minded way, always allow for the possibility that new facts will become available to challenge the hypothesis or show that it’s correct only in certain circumstances. On this basis, no matter how often a scientific hypothesis is confirmed, it is never regarded as truly “proven.” But, of course, if a hypothe-sis is confirmed repeatedly and withstands a range of challenges, scientists regard it as extremely likely to be correct. They then conclude that, at last, they can confidently build from there.

We should also mention the other possible outcome: What if the study is properly done and the data aren’t consistent with the authors’ original prediction? In that case, the hypothesis is disconfirmed, and the scientist must confront the contrary findings. Often, this means closely scrutinizing these findings to make certain the study that is challenging one’s hypothesis was done correctly. If it was, the researcher is obliged to tune the original hypothesis—or set that hypothesis aside and turn instead to some new proposal. What the scientist cannot do, though, is simply ignore the contrary find-ings and continue asserting a hypothesis that has been tested and found wanting.

Finally, with all of these safeguards in place, what about our earlier example? Are recordings containing subliminal suggestions an effective way to give up smoking or to increase your attractiveness? Several carefully designed studies have examined the effects of this type of recording, and the results are clear: The messages do seem to work, but this effect almost certainly involves a placebo effect—that is, an effect

produced by the participants’ positive expectations for the procedure and not the procedure itself (Figure 1.17). Once the investigator controls for the participants’ expec-tations about the recordings, the subliminal messages themselves produce no benefit (Greenwald, Spangenberg, Pratkanis, & Eskenazi, 1991).

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Psychology: Research Methods : Establishing Cause and Effect: the Power of Experiments |