Chapter 8 Experiments

8.1 Describe the essential features of experiments and quasi-experiments

To be an experiment (also called a randomized experiment or true experiment), you need a manipulation and random assignment. Yes, it’s still just these two features that define an experiment.

To be a quasi-experiment, you need a manipulation but no random assignment.

8.2 Why do experiments demonstrate causal relationships?

Experiments are our best tool for establishing the three requirements of causality. Put another way, experiments provide the best protection from threats to internal validity. They do this by emphasizing control. The goal is for participants in the conditions of the study to be exactly the same except for what is being manipulated. Experiments do this effectively because researchers first thoughtfully design studies so that participants are the same except for the manipulation. Experiments may take in a laboratory setting so that the environment is the same for all participants. They ensure that each experimental session is conducted the same way. If different researchers collect data, each researcher follows the same procedure and acts the same way.

Second, researchers use random assignment. Random assignment means that participants have an equal chance of being in each condition. There is no way to predict, ahead of time, which condition a participant will be in. Random assignment has an important property: It equates groups on all variables before the study begins. Take, for example, a study of the effects of a drug for arthritis. Participants might differ in how severe their arthritis is before the study begins. Random assignment means that, if we were to run this study over and over again, participants with severe arthritis would, on average, appear equally in the drug and placebo conditions. This is also true for participants’ ages, socioeconomic status, and any other variable that could be measured.

Random assignment balances all individual differences across conditions, but there is no guarantee that all individual differences will be balanced in a single study. Take gender, as an example. If you randomly assign participants to conditions, there is no guarantee that a gender will be equally distributed across the conditions. However, across many repeated studies, it would be. By this logic, a gender is equally likely to be in one condition versus the other due to random assignment. Therefore, we can rely on random assignment because it works “on average.”

If all variables (except for the manipulation) are balanced across the conditions, then any differences are due to the manipulation. Thus, experiments rule out alternative explanations for the DV.

8.3 Two ways of creating groups: Between-subjects vs within-subjects design

Experiments and quasi-experiments have a manipulation: the researcher assigns participants to different levels of the independent variable. To assess the effects of a drug, a researcher might compare a group getting the drug to a group not getting the drug (a placebo or sugar pill). The manipulation, the drug, has two levels: drug and no drug.

There are two ways to design a manipulation. Either the experimenter chooses who is in what group (between-subjects) or at what point in time each subject is in each condition (within-subjects ). Between-subjects design is manipulating across different people (participants are only exposed to one level of the IV during the research). Within-subjects designs are also called repeated measures. Within-subjects designs have every participant exposed to every level of the IV. Some manipulation is done in between repeated measures. So, a drug might be given, and the participant measured. Then, the drug is taken away and the participant is measured again.

An example of between-subjects design: I give one half of randomly selected participants caffeinated soda and I give the other participants caffeine free soda to see the effects of caffeine on task performance.

An example of within-subjects design: Participants drink caffeine-free soda for one week and take a cognitive skills test at the end of the week. In week 2, participants drink caffeinated soda for one week and take the cognitive skills test again at the end of the second week.

Another example of within-subjects design: Participants complete anxiety surveys every day for a month. On half of the days, at random, all participants are given a placebo. On the other half of the days, all participants are given the experimental anxiety drug. In within-subjects design, participants serve as their own controls (the comparison is across trials).

Students sometimes struggle with identifying between- and within-subjects designs. The key feature is what happens to participants: do they get divided into separate groups that get different levels of the treatment (between-subjects design), or do all participants get all levels of the treatment and are measured repeatedly (within-subjects design).

8.4 Which is better? Between or within?

It depends on the situation. These designs have their pros and cons:

Design Pros Cons
Between-subjects design Participants only do one condition, so study time tends to be shorter; Order effects and carryover effects do not apply; You can add more conditions to your study without extending the length of your study. Differences between participants count against you—it’s harder to reject the null hypothesis when participants differ in ways that affect the DV; If you want to add more conditions to your study, you need more participants.
Within-subjects design It’s okay for participants to be different from each other—each participant serves as their own control, and individual differences are removed from the analysis; You can add more conditions by increasing the length of the study. Participants need to do all conditions, so study time tends to be longer Must watch out for order effects and carryover effects; If you want to add more conditions to your study, you need a longer study.

In summary, within-subjects designs are better (they have more statistical power, meaning it’s easier to reject the null hypothesis) when you have individual differences that affect the DV. For example, if you are using a math test as a DV, you will have greater individual differences (variability) if you sample across high schools in the United States versus taking a sample from a single, advanced class in a well-funded school. If you expect that participants will arrive at your study with differences that affect the DV, within-subjects can be a good choice. However, within-subjects designs have the potential for order and carryover effects, which can be a deal-breaker in some cases.

8.5 Carryover Effects and Order Effects

Imagine you give caffeinated soda to participants for a week, then you measure their grades. Imagine that you have all participants drink non-caffeinated soda the following week. Then, you measure their grades again. Imagine you find a difference. Is it because caffeine raises grades? Maybe; or, maybe the effects of the caffeine withdrawal have lowered people’s abilities in the second week. If that’s plausible, it’s a threat to internal validity.

This example is called a carryover effect. A carryover effect occurs when one condition bleeds into another. Within-subjects designs assume a clean break between the conditions, which doesn’t happen when there is a carryover effect. Another example is measuring mood of participants after they watch a gory horror movie followed by a romantic comedy. Could memories of the horror movie alter the effect of the romantic comedy? If so, it’s a carryover effect. The only solutions to this problem are: (1) increase the distance between the conditions to the point where they no longer affect each other. How could that be done in the caffeine study? Perhaps participants could wait a month between the two conditions. Or, the other solution (2) is to not use a within-subjects design and instead use a between-subjects design.

There is a second way that conditions affect each other, and that is an order effect. Order effects and carryover effects are different. Order effects vary depending on which condition comes first. Imagine that you give a math test on day 1, then on day 2 participants complete a workshop. On day 3, participants complete another math test. The effects of the workshop might be confounded by practice; participants are taking a similar math test for the second time, so they might perform better even if the workshop didn’t do anything. How do we know this is an order effect? If we swapped the two math tests, we would still have the same problem; whichever test comes first tends to have a lower score than whichever one comes second. Order effects can be solved with a technique called counterbalancing. Counterbalancing means that a random half of the participants get the conditions in the opposite order. In this example, condition 1 is “no workshop” and condition 2 is “after the workshop.” A random half of your participants (to keep it an experiment) would get condition 1 first (that is, nothing, math test, workshop, math test), and the other half of participants would get condition 2 first (that is, workshop, math test, nothing, math test). Counterbalancing only solves order effects. It does not solve carryover effects. If you look at this example and think that the benefits of the workshop might continue over time, then the design could have a carryover effect as well.

To summarize, there are two threats to internal validity you have to look for when doing within-subjects designs: Carryover effects and order effects. Carryover effects can only be solved by creating more space between your conditions. Order effects can be solved by using counterbalancing. As these only apply to within-subjects designs, sometimes the best solution is to use a between-subjects design, instead.