Chapter 9 Null Hypothesis Significance Testing (NHST)

In NHST, you are trying to prove that the null hypothesis is false. You start by setting up two hypotheses (hypotheses is the plural of hypothesis):

The null hypothesis (symbol is \(H_0\)). The null hypothesis is the hypothesis of no effect. It is the opposite of the hypothesis you want to demonstrate.
The alternative hypothesis (symbol is either \(H_1\) or \(H_a\)). is the hypothesis that there is an effect. The alternative hypothesis is what you ultimately want to demonstrate. You demonstrate the alternative hypothesis by demonstrating that the data you obtained are rare if the null hypothesis was true. When you understand the previous sentence, you are on your way to mastering NHST.

These are the steps in the process, and we will follow them for every procedure we learn:

Check assumptions. Statistics carry underlying assumptions about levels of measurement and nature of the data. With a z-test, the assumptions are the same as the central limit theorem (random sampling and a continuous variable with sufficiently large sample sizes). Z-tests also require a known population mean and a known standard deviation. These requirements make the z-test less useful in real life, but the z-test is the simplest example of NHST.
Write hypotheses (singular: hypothesis). Write the formal hypotheses, an H0 and an Ha. The alternative hypothesis is what we aim to demonstrate. We proceed to the next step assuming the null hypothesis is true.
Analyze. Compute a sample statistic. Here, our statistic is z, which is based on the mean. As you already know, z is a value that increases as a sample mean moves further away from the population mean. Also compute p, the probability that you could have obtained this statistic by chance if the null hypothesis were true. p is based on two things: the statistic (in this case, z) and sample size (n). Another way to understand p: If the null were true (there was no effect to find) and we ran our study 100 times with 100 different samples, how many of those samples would have a sample statistic at least as large as the one we obtained? If your sample statistic was z = 0, then the answer would be 50.
Decide. Decide if the statistic is in the critical region . In other words, decide if your observed statistic is likely (i.e., not in the critical region) or unlikely (i.e., in the critical region) to have occurred if the null hypothesis were true. If you were unlikely to get a statistic this extreme, you reject the null hypothesis. If your statistic may have occurred by chance, you retain the null hypothesis.
Conclude. If you reject the null, your conclusion is the alternative hypothesis. Important: If you end up retaining the null hypothesis (i.e., you did not find an effect), then you can make no conclusion statements—Your research has demonstrated nothing.

Example :

I gave the SAT to a random sample of 50 people who went through an SAT prep course. I know that because of the CLT, the mean of these 50 scores should be close to the population mean, which is known because the company that makes the test knows all the scores. The standard deviation of the population is also known. The goal of this hypothesis test is to determine if my sample is unlikely to have been selected by chance if the null hypothesis was true. In other words, is my sample mean different enough from the population mean that I can conclude that any differences were not due to chance?

Hypotheses: \(H_0: \bar{X} = \mu\) \(H_a: \bar{X} \neq \mu\)

In words (for illustration only—you should use the symbols): The null hypothesis is that the sample mean will equal the population mean. The alternative hypothesis is that the sample mean will be different from the population mean. This example is a two-tailed test. In a one-tailed test, \(<\) or \(>\) are used instead of \(\neq\).

Analyze: I compute a statistic based on my sample (here z) and determine the associated value of p. A low value of p shows that my sample mean is rare (it is much higher than the population mean). If the null hypothesis were true, it would be very unlikely for me to get this value.

Decide: Because the value of my sample statistic is unlikely under the null hypothesis, I reject the null hypothesis. Rejecting the null hypothesis is also called finding statistical significance. This is one of two possible decisions. The other is to ‘retain’ the null hypothesis. We also call retaining a ‘failure to reject’ the null hypothesis.

Conclusion: I conclude that the SAT prep course had an effect, because the mean score was unlikely to have occurred under the null hypothesis (it was rare).

9.1 Power and Errors

Because either \(H_0\) or \(H_a\) could be true, and I have two conclusion choices (either nothing or \(H_0\) is false), there are four outcomes. Two of these outcomes are the “wrong answer,” and they have special names: Type I errors and Type II errors (see the table).

Decisions \(\downarrow\) ; Truth \(\rightarrow\)	\(H_0\) is True (there is no effect)	\(H_0\) is False (\(H_a\) is true; there is an effect)
Researcher decides to retain \(H_0\), “Nothing to conclude”	Correct conclusion	Type II Error (probability is \(\beta\))
Researcher decides to reject \(H_0\), “There is an effect”	Type I error (probability is \(\alpha\))	Correct conclusion (probability is \(1-\beta\))

9.2 NHST is Confusing

NHST confuses many students. It’s counterintuitive! Please come to class with your questions.

Students are not the only ones confused. NHST confuses scientists, as well. Some common misunderstandings about NHST:

\(p\) is not probability of the null being false. Since we really want to know if the null is true or false, it’s natural to think that p provides this information, but it does not. p is the probability of obtaining a sample statistic at least this extreme, assuming the null is true.
\(p\) is based on conditional probability, which confuses many people. p is a probability that already assumes the null hypothesis is true. Any statement about p should begin with “Assuming the null is true….” People fall into the trap of reversing the conditional probability when they think p is the probability of a hypothesis. Assuming I start with a brand new deck of cards, what is the probability of drawing red? It’s 50%. Let’s reverse the conditional probability: Assuming I drew a red card from a second deck, what is the probability of the second deck being new? It’s not 50%; the probability of red depends on the deck being new, but the probability of the deck being new does not depend on drawing a red card. We make same mistake with NHST. P is probability of obtaining these data if the null were true. It is not the probability of the null being true if we obtained these data.
Your decision to retain or reject is all-or-nothing. You either reject or retain. There is no grey area. There is no such thing as “highly significant” or “approaching significance.” These lead to misunderstandings of NHST.
Affirming the null is a tempting logical fallacy. Affirming the null is when a non-significant effect is taken as evidence of something. If the results of your drug trial are non-significant, you have not shown that your experimental drug has no effect. Rather, you have shown nothing; your results are inconclusive. Only rejecting the null allows conclusions to be made. For this reason, avoid the term “insignificant.” Instead, use “not significant” or “did not reach significance.”
Just because a result is significant does not mean it is important. For example, would you invest in an insomnia drug that has been shown to help people sleep for one additional minute per night, on average? NHST helps you decide, but going back to the data is needed to interpret the real-world meaning of your results.