Goodness of fit tests
Photo by darkday

Chi square goodness of fit test specifically tells how well a categorical (nominal or ordinal) sample distribution fit into a hypothetical distribution. In other words, it is the test used to check if the sample data is consistent with a hypothesized distribution of the population. The goodness of fit test is used to determine the observed sample distribution matches or fits the expected values; hence we use the term goodness of fit.

The goodness of fit is a non-parametric test because it does not rely on estimates of a population parameter like mean or variance to make an inference on the characteristics of the population.

When to use Chi-Square Goodness of Fit?

A chi square goodness-of-fit test can be conducted when there is one categorical variable with more than two levels. If there are exactly two categories, then a one proportion z test may be conducted.

Reference.

The chi-square goodness-of-fit test requires 2 assumptions2,3:

1. Independent observations;

2. For 2 categories, each expected frequency EiEi must be at least 5.

For 3+ categories, each EiEi must be at least 1 and no more than 20% of all EiEi may be smaller than 5.

Reference

Structure of a Chi-Square Goodness of Fit Test

The goodness of fit tests is structured in cells; therefore, the observed frequency goes in each cell. Furthermore, the distribution you are trying to match would have a theoretical frequency. Then, the chi-square is summed across all cells.

Use the data values structured into cells, explicitly requiring a calculated chi-square test statistic. The unknown distribution is tested; likewise, the Degrees of Freedom vary according to the distribution.

GOF Distribution  | Degrees of Freedom

Normal | # cells – 3

Poisson | # cells – 2

Binomial | # cells – 2

Uniform | # cells – 1

Steps to perform Chi-Square goodness of fit

Step1:

Firstly, define the null hypothesis and alternative hypothesis

  • Null hypothesis (H0): There is no difference between the observed value and the expected value
  • Alternative hypothesis (H1): There  is a significant difference between the observed value and the expected value

Step 2:

Secondly, specify the level of significance.

Step 3:

Thirdly, compute the χ2 statistic.

  • O is the observed value
  • E is the expected value

Step 4:

Fourthly, calculate the degree of freedom:

The degrees of freedom in chi-square test depends on the sample distribution

Step 5:

Then, find the critical value based on degrees of freedom.

Step 6:

Finally, draw the statistical conclusion:

If the test statistic value is greater than the critical value, reject the null hypothesis. Hence, we can conclude that there is a significant difference between the observed value and the expected value.

Chi-Square goodness of fit test Example 1: Did the Distribution Change?

Ten years ago, US airlines categorized the priority customers (those who completed 10,000 miles traveled in a year) based specifically on age:

Similarly, in 2020, 500 priority passengers were sampled, and below are the results:

At a 95% confidence level, would you conclude that the population distribution of priority customers changed in the last 10 years?

  • Null hypothesis (H0): The sample data meet the expected distribution.
  • Alternative hypothesis (H1): The sample data does not meet the expected distribution.

Level of significance: α=0.05:

Degrees of freedom = number of categories (n)= 4

n-1 =3

Chi-square critical value for 3 degrees of freedom =7.815

The test statistic value is greater than the critical value; hence, we can reject the null hypothesis.

So, we can conclude that the priority customers in 2020 are different than those expected based on the 2010 population.

Download Chi-Square Goodness of Fit Exemplar

Unlock Additional Members-only Content!

To unlock additional content, please upgrade now to a full membership.
Upgrade to a Full Membership
If you are a member, you can log in here.

Thank You for being a Member!

Here’s some of the bonus content that is only available to you as a paying member.

Chi-Square Normality Testing

Comments (4)

I think there is an error in the DOF calculation in the Chi Square Normality Test “Degrees of freedom = No of categories -3 =5-3 =2”.

DOF = (rows – 1) x (cols – 1)
DOF = (5-1) x (2 – 1)
DOF = 4 x 1
DOF = 4

Hello Greg Tilson,
If mean and standard deviation is not given, DF for chi-square normality test = No of categories -3

Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.