It’s vitally important to figure out which question you’re actually trying to answer. Once you have that, you can develop your null and alternative hypotheses. Generally speaking, you’re looking to create a simple question that asks whether a factor x affects scenario y, with an answer of ‘yes’ or ‘no’.
My friend has very pale blond hair. He tells me that his hair is fine because it’s blond. I disagree—I think that fine hair can be found in any hair color.
To figure this out once and for all, I could test his basic claim that blond hair is more likely to be fine than other hair colors.
A bad example of a question would be:
Is blond hair or other hair colors more likely to be fine?
Why? Because it doesn’t lend itself to a yes/no answer. It’s actually proposing three possible answers (‘blond’, ‘other’, ‘chance’), rather than offering a choice between ‘chance’ and ‘not chance’. Plus, it doesn’t actually define ‘fine’ hair in measurable terms.
A good example of a question would be:
Is hair classified as naturally ‘blond’ more likely to measure less than 60 microns in diameter?
Once I’ve framed the question in a simple yes/no format without ambiguities, developing the null and alternative hypotheses is actually pretty simple. The null hypothesis should state that chance is the only factor in seeming correlations between hair color and hair diameter:
There is no correlation between hair color and the diameter of a single hair.
The alternative hypothesis should state that, as my friend claims, blond hair is more likely to be of small diameter:
Hair classified as naturally ‘blond’ is more likely to measure less than 60 microns in diameter.
The next step is to figure out the significance applied to your test. This consists of two basic elements: sample size and confidence level.
When it comes to sample size, the ideal is to gather data for the whole population on which you’re focusing. However, trying to gather data on a whole population (for example, the entire population of the United States) is cost-prohibitive. So you need a sample of the population – one that is large enough to provide an acceptable cross-section of the population in terms of the hypotheses being tested.
You also need to decide on a confidence level. This is how sure you need to be that the results you receive are actually statistically significant, and that the conclusion based on them is correct. Once you’ve decided that, you can calculate the alpha level, which is simply (1 – confidence level). The standard confidence level used is 95%, or 0.95. Hence, the standard alpha level is 5%, or 0.05.
It’s important to choose the correct style of test to apply to your sample. This depends on the hypotheses you’re looking at, and the data that you’re using. Some basic questions that can help you to decide which test to use are:
- What level of measurement was used?
- How many different samples were used?
- What type of analysis do you need to do?
The questions that you need to ask could be far more complicated – see the National Center for Biotechnology’s in-depth article, How to choose the right statistical test? for more information.
There are some great decision trees available on Bren School of Environmental Science and Management’s Stats the Way I Like It site.
Once you’ve run your data through the selected test, you’ll have your results. But that’s not the end of your work! The next step is to interpret those results. One of the key values supplied by any statistical test is the p-value, which gives you the probability that you will make an error in your conclusion by depending upon the results.
A p-value less than the alpha level decided upon in the Decide Significance step means that you can assume that your results are statistically significant. The null hypothesis can be rejected, and the alternative hypothesis can be supported.
A p-value greater than the alpha level means that you cannot assume that your results are statistically significant, and hence cannot reject the null hypothesis.
This video walk-through of an independent sample t-test provides a simple example of interpreting test results:
The final step is to make a decision from your results, and draw up your conclusion. There are two basic decisions that you can make when you’ve interpreted the results:
- Reject the null hypothesis and support the alternative hypothesis.
- Fail to reject the null hypothesis.
Once you’ve made the decision, you need to construct a conclusion. This should clearly communicate your original hypothesis, the sample on which it was tested, the decision you made, and any additional information that you think is important to convey.
This video provides some good step-by-step instructions on how to construct a conclusion based on your decision: