The Chi-Square (χ2) distribution is the best method to test a population variance against a known or assumed value of the population. A Chi-Square distribution is a continuous distribution with degrees of freedom.

The best part of a Chi-Square distribution is that it describes the distribution of a sum of squared random variables. It is also used to test the goodness of fit of data distribution, whether a data series is independent, and for judging the confidences surrounding variance and standard deviation for a random variable from a normal distribution.

History of Chi-Square

Karl Pearson (1857 – 1936), the father of modern statistics (founded the first statistics department in the world at University College London), came up with the Chi-Square distribution. Pearson’s work in statistics began when he developed a mathematical method for studying the process of heredity and evolution. Later, the Chi-Square distribution came about as Pearson tried to find a measure of the goodness of fit of other distributions to random variables in his heredity model.

Chi-Square Statistic

A Chi-Square distribution may skew to the right or with a long tail toward the significant values of the distribution. The overall shape of the distribution will depend on the number of degrees of freedom in a given problem. The degree of freedom is one less than the sample size.

Chi-Square Properties

  • The mean of the distribution is equal to the number of degrees of freedom: μ=ϑ.
  • The variance equals two times the number of degrees of freedom: σ2 = 2*ϑ.
  • When the degrees of freedom are greater than or equal to 2, the maximum value for Y occurs when χ2=ϑ-2.
  • As the degrees of freedom increase, the Chi-Square curve nears a normal distribution.
  • As the degrees of freedom increase, the symmetry of the graph also increases.
  • Finally, It may be skewed to the right, and since the random variable it is based on is squared, it has no negative values. As the degrees of freedom increase, the probability density function begins to appear symmetrical.

The formula for the probability density function of the Chi-Square distribution is

Where ϑ is the shape parameter, and Γ is the gamma function.

The formula for the gamma function is

Chi-Square (χ2) Hypothesis Test

Usually, the goal of the Six Sigma team is to find the level of variation of the output, not just the mean of the population. Above all, the team would like to know how much variation the production process shows about the target to see what changes are needed to reach a process free of defects.

For a comparison between several sample variances or a comparison between frequency proportions, the standard test statistic called the Chi-Square χ2 test will be used. So, the distribution of the Chi-Square statistic is called the Chi-Square distribution.

Types of Chi-Square Hypothesis Tests

There are two types of Chi-Square tests:

  • Chi-Square Test of Independence: Determines whether there is any association between two categorical variables by comparing the observed and expected frequencies of test outcomes when there is no defined population variance.
  • Chi-Square Test of Variance: Compare the variances when the variance of the population is known.

Chi-Square Test of Independence

The Chi-Square Test of Independence determines whether there is an association between two categorical variables (like gender and course choice). For example, the Chi-Square Test of Independence examines the association between one category, like gender (male and female), and another category, like the percentage of absenteeism in a school. The Chi-Square Test of Independence is a non-parametric test. In other words, you do not need to assume a normal distribution to perform the test.

A Chi-Square test uses a contingency table to analyze the data. Each row shows the categories of one variable. Similarly, each column shows the categories of another variable. Each variable must have two or more categories. Each cell reflects the total number of cases for a specific pair of categories.

Assumptions of Chi-Square Test of Independence

  • Variables must be nominal or categorical
  • Categories of variables are mutually exclusive
  • The sampling method is a simple random sampling
  • The data in the contingency table are frequencies or count

Steps to Perform Chi-Square Test of Independence

Step1: Define the Null Hypothesis and Alternative Hypothesis

  • Null Hypothesis (H0): There is no association between two categorical variables
  • Alternative Hypothesis (H1): There  is a significant association between two categorical variables

Step2: Specify the level of significance

Step 3: Compute χ2 statistic

Chi Square Distribution & Hypothesis Test
  • O is the observed frequency
  • E is the expected frequency

The expected frequency is calculated for each cell = (frequency of columns * frequency of rows)/ n

Step 4: Calculate the degree of freedom = (number of rows -) * (number of columns -1)= (r-1) * (c-1)

Step 5: Find the critical value based on degrees of freedom

Step 6: Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value, reject the null hypothesis, and hence, we can conclude that there is a significant association between two categorical variables.

Chi-Square Test of Independence Example

For example, 1000 middle school students are asked which their favorite superhero is: Superman, Ironman, or Spiderman. At a 95% confidence level, would you conclude that there is a relationship between gender and superhero characters?

  • Null Hypothesis (H0): There is no association between gender and favorite superhero characters.
  • Alternative Hypothesis (H1): There is a significant association between gender and favorite superhero characters.

Level of significance: α=0.05:

First, calculate the expected frequency:

Chi Square Distribution & Hypothesis Test

For the cell (Boys, Superman) = (200 * 600)/ 1000 = 120

Similarly, determine the expected frequency of all cells:

Chi Square Distribution & Hypothesis Test

Degrees of freedom = (r – 1) * (c – 1) = (2 – 1) * (3 – 1) =2

Chi-Square critical value for 2 degrees of freedom =5.991

The test statistic value is greater than the critical value; hence, we can reject the null hypothesis.

So, we can deduce a significant association between gender and favorite superhero characters.

Download Chi-Square Test of Independence Excel Exemplar

Unlock Additional Members-only Content!

To unlock additional content, please upgrade now to a full membership.
Upgrade to a Full Membership
If you are a member, you can log in here.

Thank You for being a Member!

Here’s some of the bonus content that is only available to you as a paying member.

A: 57.7. and 4 degrees of freedom.

First, we will calculate the degrees of freedom. This is an easy way to eliminate half the answers on the page.

There are five rows and two columns in the chart.

So, Degrees of freedom = (rows -1) * (columns – 1) = (5-1) * ( 2 – 1) = 4* 1 = 4.

Now we’ll run the equation Chi Squared = X^2 = Σ (((o-E)^2 )/ E) = 100 / 20 + 25 / 20 + 900 / 20 + 25/20 + 100 / 20 = 1150 / 20 = 57.5

When you’re ready, there are a few ways I can help:

First, join 30,000+ other Six Sigma professionals by subscribing to my email newsletter. A short read every Monday to start your work week off correctly. Always free.

If you’re looking to pass your Six Sigma Green Belt or Black Belt exams, I’d recommend starting with my affordable study guide:

1)→ 🟢Pass Your Six Sigma Green Belt​

2)→ ⚫Pass Your Six Sigma Black Belt ​​

You’ve spent so much effort learning Lean Six Sigma. Why leave passing your certification exam up to chance? This comprehensive study guide offers 1,000+ exam-like questions for Green Belts (2,000+ for Black Belts) with full answer walkthroughs, access to instructors, detailed study material, and more.

​ Join 10,000+ students here. 

Comments (27)

The distribution refers to what probability an arrangement of values of a variable showing their observed or theoretical frequency of occurrence. The Chi Square distribution looks like a skewed bell curve.

The Ch Square test is a mathematical procedure used to test whether or not two factors are independent or dependent. Chi square is a test of dependence or independence. In other words, you use this test (which makes use of the chi square distribution) to see if there is a statistically valid dependence of one thing on another. Check out the examples above and you’ll see.

Hi,
In the example 1 above (The Barnes Company), X² statistic is < X² (table) and the decision was "we reject the H0".
However, for the F-test (see example : https://sixsigmastudyguide.com/f-distribution/), F statistic is < F (table) and the decision was "we fail to reject the H0". same decision taken for T-test (see example : https://sixsigmastudyguide.com/paired-t-distribution-paired-t-test/).
Could you please clarify ?
Thanks
Best regards.
Hakim

Hi Hakim,

In the future we will accept support questions through our Member Support forum. Here’s the answer for this case (courtesy of Trey):

The Chi-square example above asks the question (which is our alternate hypothesis), “Is the sample variance statistically significantly less than the currently claimed variance?” Our null hypothesis is to say that the variances are not different. This means that the rejection region would to the left of the table statistic and the distribution is “left-tailed”. Since the test statistic falls in that region, we would reject H0.

For the example cited in https://sixsigmastudyguide.com/f-distribution/, student A states the null hypothesis, that the variances are the same and Student B says that they are different (Ha: σ21 ≠ σ22 ,which is the alternate hypothesis). This would be indicative of a two-tailed distribution and we would reject the null if F ≤ F1−α∕2 or F ≥ Fα∕2 (see table below). In this case we used the FTable for α = .05, since the risk we are willing to take is 0.10. Therefore, our calculated statistic (F) of 2.65 is greater than 2.59 (F ≥ Fα∕2), so we reject the null and say that there is a difference in variance.

See the F test Hypothesis testing graphic above.

In the T-test example, the calculated statistic fell below the Table T value (3.25). The distribution is identified as a two tail distribution, so we would fail to reject the null hypothesis if we were below the table statistic, calculated at 9 degrees of freedom and 0.005 alpha. We don’t have to look at the lower tail as we assume that the data is normally distributed where the t-test is used.

Hi Ted,

For the Chi Square example – can you please explain what 100 hours 2 means for specified variance shouldn’t it equal 100 since 10^2=100? I’m not sure I understand what 100 hours 2 means?

Thank you

Hi Cheryl,

I agree this isn’t as clear as I’d like. What’s missing is the 2 should be listed as a superscript and have a ^ proceeding it to denote the square. I’ll list this as an item to clean up.

Best, Ted.

Regarding the Chi-squared right tail test example:

Q: Could the claim about increased variation in the new model be validated with 5% significance level?

A: Test statics is less than the critical value and it is not in rejection region. Hence we failed to reject the null hypothesis. There is no sufficient evidence to claim the battery life of new model show more variability.

We failed to reject the null hypothesis. Doesn’t that meant then that there is sufficient evidence to claim that the new model ahs increased variation?

Hi April-Lynn,

I think you might be asking about how we word hypothesis tests. We basically have 2 choices in hypothesis tests; Fail the null hypothesis or accept the alternative.

Failure to reject H0 (the null hypothesis) means that we CANNOT accept the alternative hypothesis (H1).

Our null hypothesis is that there is not evidence of more variability. The Alternative is that there is.

Does that help?

Best, Ted.

Regarding the Two tail test example.

S (70) > σ (49). How do I know to use the two tail test vs. right tail?

Hi April-Lynn,

We are using two tails because we want to see if there’s a difference in either the positive or negative direction. For this problem we don’t necessarily care which of the salaries is higher, we just want to prove a difference.

Since a two-tailed test uses both the positive and negative tails of the distribution, it tests for the possibility of positive or negative differences.

We’d use a one-tailed test if we wanted to determine if there was a difference between salary groups in a specific direction (eg A is higher than B).

Best, Ted

What in the question would lead me to know that I want to see if there’s a difference in either the positive or negative direction?

Nothing. It’s that absence – or the ask of absolute difference that leads to 2 tails.

One tail would be used if the hypothesis was one is greater than the other.

There are so many mistakes. Whoever wrote these problems have been careless . For RT tailed test, for look up in the table we look for alpha and for LT tailed test, we look up for (1-alpha), right ? You have not followed it in a few places and have done the other way.

For example, in the below question, the H0 and H1 are not correct.

Smartwatch manufacturer received customer complaints about the XYZ model, whose battery lasts a shorter time than the previous model. The variance of the battery life of the previous model is 49 hours. 11 watches were tested, and the battery life standard deviation was 9 hours. Assuming that the data are normally distributed, Could the claim about increased variation in the new model be validated with 5% significance level?

Population standard deviation σ12= 49 hours σ1 = 7

Sample standard deviation = 9hours

The null hypothesis is H0: σ12 ≤ (7)2

The alternative hypothesis is H1: σ12 > (7)2

this is not correct. Correct H0 and H1 are

The null hypothesis is H0: σ12 ≥ (7)2

The alternative hypothesis is H1: σ12 < (7)2 and should be LT tailed test.

If 5% or 95% significance level is given. In Chi-square table which value to choose between 0.95 & 0.05. It’s confusing, I believe there should be a criteria to choose the right one.

Suppose the weekly number of accidents over a 30-week period is as follows:
8 0 0 1 3 4 0 2 12 5
1 8 0 2 0 1 9 3 4 5
3 3 4 7 4 0 1 2 1 2
Test the hypothesis that the number of accidents in a week has a Poisson
distribution. [Hint: Use the following class (number of accident): 0, 1, 2~3, 4~5,
more than 5]

Hi Bharat,

I appreciate you sharing a question, but we do not solve homework problems for people here.

I am happy to coach you through it, though. What do you see as the first step?

I am struggling with which column to use from the chi-square table. I think the rules are as follows:

1. For a left tailed test, use the column associated with the confidence level. (e.g. the 95% column)
2. For a right tailed test, use the column associated with the significance level (stated differently, the 1 – confidence level).

Is this correct?

Yes, Jeffery Carlson,

The significance levels (α) are listed at the top of the table. Find the column corresponding to your chosen significance level.

To calculate a confidence interval, choose the significance level based on your desired confidence level:

α = 1 − confidence level

The most common confidence level is 95% (.95), which corresponds to α = .05.

The table provided above gives the right-tail probabilities. You should use this table for most chi-square tests, including the chi-square goodness of fit test and the chi-square test of independence, and McNemar’s test.

Example: for df = 10 and the column for α = .05 meet, the critical value is 18.307

If you want to perform a left-tailed test, you’ll need to make a small additional calculation.

Left-tailed tests:

The most common left-tailed test is the test of a single variance when determining whether a population’s variance or standard deviation is less than a certain value.

To find the critical value for a left-tailed probability in the table, simply use the table column for 1 − α. In other words, the confidence level column

look up the left-tailed probability in the right-tailed table by subtracting one from your significance level: 1 − α = 1 − .05 = 0.95.

Ex: The critical value for df = 25 − 1 = 24 and α = .95 is 13.848.

Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.