Chi square goodness of fit test specifically tells how well a categorical (nominal or ordinal) sample distribution fit into a hypothetical distribution. In other words, it is the test used to check if the sample data is consistent with a hypothesized distribution of the population. The goodness of fit test is used to determine the observed sample distribution matches or fits the expected values; hence we use the term goodness of fit.

The goodness of fit is a non-parametric test because it does not rely on estimates of a population parameter like mean or variance to make an inference on the characteristics of the population.

## When to use Chi-Square Goodness of Fit?

A chi square goodness-of-fit test can be conducted when there is one categorical variable with more than two levels. If there are exactly two categories, then a one proportion z test may be conducted.

Reference.

The chi-square goodness-of-fit test requires 2 assumptions2,3:

1. Independent observations;

2. For 2 categories, each expected frequency EiEi must be at least 5.

For 3+ categories, each EiEi must be at least 1 and no more than 20% of all EiEi may be smaller than 5.

Reference

## Structure of a Chi-Square Goodness of Fit Test

The goodness of fit tests is structured in cells; therefore, the observed frequency goes in each cell. Furthermore, the distribution you are trying to match would have a theoretical frequency. Then, the chi-square is summed across all cells.

Use the data values structured into cells, explicitly requiring a calculated chi-square test statistic. The unknown distribution is tested; likewise, the Degrees of Freedom vary according to the distribution.

GOF Distribution  | Degrees of Freedom

Normal | # cells – 3

Poisson | # cells – 2

Binomial | # cells – 2

Uniform | # cells – 1

## Steps to perform Chi-Square goodness of fit

Step1:

Firstly, define the null hypothesis and alternative hypothesis

• Null hypothesis (H0): There is no difference between the observed value and the expected value
• Alternative hypothesis (H1): There  is a significant difference between the observed value and the expected value

Step 2:

Secondly, specify the level of significance.

Step 3:

Thirdly, compute the χ2 statistic.

• O is the observed value
• E is the expected value

Step 4:

Fourthly, calculate the degree of freedom:

The degrees of freedom in chi-square test depends on the sample distribution

Step 5:

Then, find the critical value based on degrees of freedom.

Step 6:

Finally, draw the statistical conclusion:

If the test statistic value is greater than the critical value, reject the null hypothesis. Hence, we can conclude that there is a significant difference between the observed value and the expected value.

## Chi-Square goodness of fit test Example 1: Did the Distribution Change?

Ten years ago, US airlines categorized the priority customers (those who completed 10,000 miles traveled in a year) based specifically on age:

Similarly, in 2020, 500 priority passengers were sampled, and below are the results:

At a 95% confidence level, would you conclude that the population distribution of priority customers changed in the last 10 years?

• Null hypothesis (H0): The sample data meet the expected distribution.
• Alternative hypothesis (H1): The sample data does not meet the expected distribution.

Level of significance: α=0.05:

Degrees of freedom = number of categories (n)= 4

n-1 =3

Chi-square critical value for 3 degrees of freedom =7.815

The test statistic value is greater than the critical value; hence, we can reject the null hypothesis.

So, we can conclude that the priority customers in 2020 are different than those expected based on the 2010 population.

## Unlock Additional Members-only Content!

To unlock additional content, please upgrade now to a full membership.

If you are a member, you can log in here.

## Thank You for being a Member!

Here’s some of the bonus content that is only available to you as a paying member.

## Authors

This entry was posted in and tagged . Bookmark the .

## Comments (4)

Greg Tilson says:

I think there is an error in the DOF calculation in the Chi Square Normality Test “Degrees of freedom = No of categories -3 =5-3 =2”.

DOF = (rows – 1) x (cols – 1)
DOF = (5-1) x (2 – 1)
DOF = 4 x 1
DOF = 4

Ramana PV says:

Hello Greg Tilson,
If mean and standard deviation is not given, DF for chi-square normality test = No of categories -3

Thanks

Ashwin Ramkumar says:

If i may, What is the formula if mean and Std Dev are given?

Ramana PV says:

Hello Ashwin,

df = number of intervals – 1, since the mean and standard deviation are given

Thanks

This site uses Akismet to reduce spam. Learn how your comment data is processed.