## Overview of Hypothesis Testing

Hypothesis testing gives lets us test our assumptions and beliefs by using data analysis. This way we can say how likely something is to be true or not within a given standard of accuracy. You may know all the stats in the world, but if the wrong conclusion is wrong you could make a multi million error. So setting up the right Hypothesis is important.

### Steps to Hypothesis Testing

- Identify the right question to ask.
- Ex. Are Cycle times / Error rates / Conversion rates statistically different based on different features (groups of people, processes followed, geography, level of training, age, etc)

- Determine the level of significance needed.
- How certain do we need to be of the sampling? Remember, you only use hypothesis testing when analyzing a sample of an entire population.

- Choose the right test for the data.
- This is based off the kind of data you have and the kind of question you are asking.

- Interpret the results.
- The P value is integral in using a hypothesis test to make a decision. It reflects the possibility of falsely rejecting the null hypothesis when it really is true.
- If the P value is less than or equal to the agree-upon significance level (alpha), then you reject the null and can support the alternate hypothesis.
- If the P value is greater, then you cannot reject the null hypothesis. (in stats terms, you have to fail to reject the null) And thus you cannot support the alternate.

- Make a decision based on those results.

Null Hypothesis = Assumption that the experimental results are due to chance alone; nothing (from 6M) influenced our results

Alternate Hypothesis = We expect to find certain outcome

Significant results = When the experimental results are not likely to occurred by chance

Example #1 : Has cycle time of my transaction has changed from year 1 to year 2

H0 = Average of Year 1 = Average of Year 2 ; No change occurred; any change is due to chance alone

Ha = Average of Year 1 NOT = Average of Year 2.

Example # 2 : Determine is a new machining process has reduced the diameter of a product

H0 = It did not reduce the diameter

HA = It did reduce the diameter

### Test statistic

Is calculated from sample data. In order to test the null hypothesis, a test calcultion is made from the sample. That calculated (test) value is then compare it to a critical value. Depending on the comparison, decisions are made based on where the test statistic fall based on the critical value.

NULL hypothesis is never accepted ; it is fail to reject. We are always testing the NULL.

If the TEST STATISTIC falls in the rejection region (beyond critical value) , then we REJECT THE NULL

Confidence level (95%) + Significance level (5%) = 100 %

Due to chance alone 95% of the time the test statistic will fall in the “Fail to reject” region, and 5% of the time due to chance alone, the test statistic will fall in the “Critical region or Rejection Region”

P value : The probability of the sample being studied could have drawn from the population due to chance

If P is low, Null must Go

## Null Hypothesis ( H_{0} )

The assumptions that experimental results are due to chance alone is called Null Hypothesis.

A Null Hypothesis is what you would expect by chance alone.

A Null Hypothesis assumes things to be equal.

A Null Hypothesis is NOT your theory

- When the null hypothesis contains only an equal sign
- The hypothesis test has two tails (or rejection regions).
- The alternative hypothesis contains a “not equal to” sign.
- It can be rejected by the test statistic being significantly large or small.

Statement of zero or no change. If the original claim includes equality (<=, =, or >=), it is the null hypothesis. If the original claim does not include equality (<, not equal, >) then the null hypothesis is the complement of the original claim. The null hypothesis *always* includes the equal sign. The decision is based on the null hypothesis.

## Alternative Hypothesis ( H_{1} or H_{a} )

This is your theory.

Statement which is true if the null hypothesis is false. **The type of test (left, right, or two-tail) is based on the alternative hypothesis**.

When the null hypothesis contains only an equal sign, the alternative hypothesis contains a “not equal to” sign.

### Alternative Hypothesis for a Two Tailed Test

H0: µ_{new} = µ_{current } Ha: µ_{new} is not = µ_{current}

## Errors

Type 1 Error (Alpha) – Happens when our significance level is too large

Type 2 Error (Beta) – Happens when our significance level is too small

### Type I error (alpha risk)

Rejecting the null hypothesis when it is true (saying false when true). Usually the more serious error.

Type 1 error involves the Significance level. For example, if alpha = 5%, then 5% of the time we will say there is a real difference between the null and alternate hypothesis (reject the null hypothesis) when there is no evidence of a difference.

### Type II error (beta risk)

Failing to reject the null hypothesis when it is false (saying true when false).

### alpha

Probability of committing a Type I error.

### beta

Probability of committing a Type II error.

## Test statistic

Sample statistic used to decide whether to reject or fail to reject the null hypothesis.

### Two Tailed Test

H0: µ_{new} = µ_{current } Ha: µ_{new} is not = µ_{current}

For example, if the null hypothesis has an equal sign, then this is a 2 tailed test and you can use the test statistic to reject the null hypothesis if the test statistic is too large or too small.

## Critical region

Set of all values which would cause us to reject H_{0}

## Critical value(s)

The value(s) which separate the critical region from the non-critical region. The critical values are determined independently of the sample statistics.

A **critical value **separates the rejection region from the non-rejection region.

## Significance level ( alpha )

The probability of rejecting the null hypothesis when it is true. alpha = 0.05 and alpha = 0.01 are common. If no level of significance is given, use alpha = 0.05. The level of significance is the complement of the level of confidence in estimation.

The **significance level **(denoted by Alpha) is the probability that the test statistic will fall in the critical region when the null hypothesis is actually true.

## Decision

A statement based upon the null hypothesis. It is either “reject the null hypothesis” or “fail to reject the null hypothesis”. We will never accept the null hypothesis.

A p value is the probability of getting a test statistic that is at least as extreme as the one found from the sample data

## Hypothesis Test

I) **Left tailed test**: if the alternative hypothesis H1 contains the less-than inequality symbol (<), the hypothesis test is a left-tailed test.

II) **Right**–**tailed test: **If the alternative hypothesis H1 contains the greater-than inequality symbol (>), the hypothesis test is a right-tailed test.

- In hypothesis testing, when performing a right-tailed test we reject the null hypothesis if the test statistics is larger than the critical value.
- Only when the test statistic is larger than the critical value will we be able to reject the null. Usually (emphasis on “usually”) we are hoping that we can reject the null because that means that our efforts are not in vain. If you are testing the null hypothesis and you are hoping that you have not adversely affected the process, you would then be hoping NOT to reject the null.

III) **Two**–**tailed test: **If the alternative hypothesis H1contains the not-equal-to symbol (), the hypothesis test is a two-tailed test. In a two-tailed test, each tail has an area of (1/2 Alpha)

- Has two tails (or rejection regions) when the null hypothesis contains an equal sign.

## Conclusion

A statement which indicates the level of evidence (sufficient or insufficient), at what level of significance, and whether the original claim is rejected (null) or supported (alternative).

## Hypothesis Testing Video

**Hypothesis testing and p-values**: Hypothesis Testing and P-values

## Hypothesis Testing Examples

Great examples here: http://www.unc.edu/~blopes/files/stat11spring03/Files/HypothesisTesting.pdf

## Great Hypothesis testing notes – Not sure where I found these – need attribution:

Consider the hypothesis as a trial against the null hypothesis. The data is evidence against the mean. You assume the mean is true and try to prove that it is not true. **After finding the test statistic and p-value, if the p-value is less than or equal to the significance level of the test we reject the null and conclude the alternate hypothesis is true**. If the p-value is greater than the significance level then we fail to reject the null hypothesis and conclude it is plausible. Note that we cannot conclude the null hypothesis is true, just that it is plausible.

If the question statement asks you to determine if there is a difference between the statistic and a value, then you have a two tail test, the null hypothesis, for example, would be μ = d vs. the alternate hypothesis μ ≠ d

If the question asks to test for an inequality you make sure that your results will be worthwhile. For example, say you have a steel bar that will be used in a construction project. If the bar can support a load of 100,000 psi then you’ll use the bar, if it cannot then you will not use the bar.

If the null was μ ≥ 100,000 vs. the alternate μ < 100,000 then will have a meaningless test. in this case if you reject the null hypothesis you will conclude that the alternate hypothesis is true and the mean load the bar can support is less than 100,000 psi and you will not be able to use the bar. However, if you fail to reject the null then you will conclude it is plausible the mean is greater than or equal to 100,000. You cannot ever conclude that the null is true. As a result you should not use the bar because you do not have proof that the mean strength is high enough.

If the null was μ ≤ 100,000 vs. the alternate μ > 100,000 and you reject the null then you conclude the alternate is true and the bar is strong enough; if you fail to reject it is plausible the bar is not strong enough, so you don’t use it. in this case you have a meaningful result.

Any time you are defining the hypothesis test you need to consider whether or not the results will be meaningful

Faced with rising fax costs, a firm issued a guideline that transmissions of 10 pages or more should be sent by 2-day mail instead. Exceptions are allowed, but they want the average to be 10 or below. The firm examined 35 randomly chosen fax transmissions during the next year, yielding a sample mean of 14.44 with a standard deviation of 4.45 pages. At the .01 level of significance, is the true mean greater than 10? *I don’t understand how to get my critical value, test statistic, or p-value. Thanks for your help.

Let miu = mean number of pages of fax transmissions

Null hypothesis: miu = 10

Alternative hypothesis: miu > 10

Level of significance = 0.01

Reject null hypothesis if p-value <= 0.01

We carry out a one-tailed Z-test.

Since population variance is unknown, we need to estimate it:

sigma^2 = {n/(n – 1)} . (Sample variance)^2 = (35/34) . (4.45^2)

With a graphic calculator, I enter the following:

1. Do a Z-test.

2. Enter the null hypothesis’s value.

3. Enter sigma’s value (computed above)

4. Enter sample mean (i.e. 14.44)

5. Enter sample size (i.e. 35).

6. Do a one-tailed test (right tail).

calculated p-value = 0.00000000298, so we reject the null hypothesis and claim that there is significant evidence, at the 1% significance level, that the true mean is greater than 10.

## Six Sigma Black Belt Certification Hypothesis Testing Questions:

**Question:** Which of the following terms is used to describe the risk of a type I error in a hypothesis test?

(Taken from ASQ sample Black Belt exam.)

(A) Power

(B) Confidence level

(C) Level of significance

(D) Beta risk

**Answer: **(C) Level of Significance. A type 1 error involves the Significance level. For example, if alpha = 5%, then 5% of the time we will say there is a real difference between the null and alternate hypothesis (reject the null hypothesis) when there is no evidence of a difference. The lower the alpha, the lower our chance of making a type 1 error.