Goodness of fit tests
Photo by darkday

Chi square goodness of fit test specifically tells how well a categorical (nominal or ordinal) sample distribution fit into a hypothetical distribution. In other words, it is the test used to check if the sample data is consistent with a hypothesized distribution of the population. The goodness of fit test is used to determine the observed sample distribution matches or fits the expected values; hence we use the term goodness of fit.

The goodness of fit is a non-parametric test because it does not rely on estimates of a population parameter like mean or variance to make an inference on the characteristics of the population.

When to use Chi Square Goodness of Fit?

A chi-square goodness-of-fit test can be conducted when there is one categorical variable with more than two levels. If there are exactly two categories, then a one proportion z test may be conducted.

Reference.

The chi-square goodness-of-fit test requires 2 assumptions2,3:

1. independent observations;

2. for 2 categories, each expected frequency EiEi must be at least 5.

For 3+ categories, each EiEi must be at least 1 and no more than 20% of all EiEi may be smaller than 5.

Reference

Structure of a Chi Square Goodness of Fit Test

The Goodness of fit tests are structured in cells; therefore, the observed frequency goes in each cell. Furthermore, the distribution you are trying to match would have a theoretical frequency. Then, the Chi square is summed across all cells.

Use the data values structured into cells and explicitly requires a calculated chi-square test statistic. The unknown distribution is tested, and likewise, the Degrees of Freedom vary according to the distribution.

GOF Distribution  | Degrees of Freedom

Normal | # cells – 3

Poisson | # cells – 2

Binomial | # cells – 2

Uniform | # cells – 1

Steps to perform Chi Square goodness of fit

Step1:

Firstly, define the null hypothesis and alternative hypothesis

  • Null hypothesis (H0): There is no difference between the observed value and the expected value
  • Alternative hypothesis (H1): There  is a significant difference between the observed value and the expected value

Step 2:

Secondly, specify the level of significance

Step 3:

Thirdly, compute the χ2 statistic

  • O is the observed value
  • E is the expected value

Step 4:

Fourthly, calculate the degree of freedom:

The degrees of freedom in chi square test depend on the sample distribution

Step 5:

Then, find the critical value, based on degrees of freedom

Step 6:

Finally, draw the statistical conclusion:

If the test statistic value is greater than the critical value, reject the null hypothesis, and hence we can conclude that there is a significant difference between the observed value and the expected value.

Chi Square goodness of fit test Example 1: Did the Distribution Change?

Ten years ago, US airlines categorized the priority customers (those who completed 10,000 miles traveled in a year) based specifically on age

Similarly, in 2020, 500 priority passengers are sampled, and below are the results

At a 95% confidence level, would you conclude that the population distribution of priority customers changed in the last 10 years?

  • Null hypothesis (H0): The sample data meet the expected distribution.
  • Alternative hypothesis (H1): The sample data does not meet the expected distribution.

Level of significance: α=0.05

Degrees of freedom = number of categories (n)= 4

n-1 =3

Chi-square critical value for 3 degrees of freedom =7.815

The test statistic value is greater than the critical value; hence, we can reject the null hypothesis.

So, we can conclude that the priority customers in 2020 are different than those expected based on the 2010 population.

Download Chi Square Goodness of Fit Exemplar

Unlock Additional Members-only Content!

To unlock additional content, please upgrade now to a full membership.
Upgrade to a Full Membership
If you are a member, you can log in here.

Chi Square Normality Test

Many statistical techniques (regression, ANOVA, t-tests, etc.) rely on the assumption that data is normally distributed. Hence, the Chi-square goodness of fit test is one of the good options to check whether the data follows a normal distribution.

Furthermore, the Chi-square goodness of fit test is an alternative to the Anderson-Darling test, Kolmogorov-Smirnov (K-S) test, and Shapiro-Wilk test to test the normality.

Similar to other statistical hypothesis tests, the Chi-square goodness of fit test also needs to compute the test statistics and find the critical value for a given degree of freedom and confidence level. If the Chi-Square value is greater than the critical value, reject the null hypothesis.

How to conduct Chi Square goodness fit normality test

  • Determine the null hypothesis, i.e. data is sampled from a normal distribution and alternative hypothesis.
  • Compute the sample mean as well as the standard deviation.
  • Then, define the confidence level
  • Bin the data: Determine non-overlapping bins, and then count values in each bin
  • Furthermore, find the cumulative probability for each category endpoint.
  • After that, compute the probability that a randomly selected value would go in each category
  • Find expected observations for each bin, which is the product of the probability of observation would fall in the bin compared to the sample size (n)
  • Likewise, compute the chi-square statistic. χ2 = Σ [ ( Observed frequency – Exp frequency) 2 / Exp frequency]
  • Find the degrees of freedom based specifically on the number of categories or bins. The degrees of freedom for the Chi-square goodness of fit test is always the number of categories-1-2(two estimated parameters, mean and standard deviation) = k-3 (note: valid if mean and standard deviation are not given)
  • Find the critical value, based on degrees of freedom.
  • Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value, reject the null hypothesis.

Chi Square normality test example

Example: 28 students’ weights (in kgs) are collected in ABC school. Test whether the data is normally distributed given that the confidence level is 95%.

  • H0: Students’ weights in an ABC school follow a normal distribution
  • H1: Students’ weights in an ABC school do not follow a normal distribution
Firstly, compute the sample mean and standard deviation.
  • Select all the data and enter” =Average(B2:H5)” for mean and “ Stdev(B2:H5)” for standard deviation
  • Confidence level = 95%
Secondly, bin or categories the data and count the values
Thirdly, find the cumulative probability for each category endpoint.
  • In cell H11 type NORMDIST (20, L2, L4, TRUE),
  • Similarly H12 =  NORMDIST (25, L2, L4, TRUE),
  • H13 =  NORMDIST (30, L2, L4, TRUE),
  • H14=  NORMDIST (35, L2, L4, TRUE),
  • H15 =  NORMDIST (∞, L2, L4, TRUE),
Fourthly, compute the probability that a randomly selected value would go in each category
  • For I 11 cell, ie less than 20 copy the same value ie 0.118431892
  • In cell I 12, i.e. values between 25 to 20= H12-H11 = 0.350912244-0.118431892 = 0.232480352
  • Similarly for I13 = H13-H12=0.310801442
  • I14 = H14-H13 = 0.226512236
  • I15 = H15-H14 = 0.111773954
Then, find expected observations in each bin, which is the product of the probability of observation would fall in the bin to the sample size (n) =28
  • J11 = I11*$G$16 = 3.316092984
  • Similarly, J12=I12*$G$16=6.509449848
  • J13=I13*$G$16=8.702440383
  • J14=I14*$G$16=6.342346084
  • J15= I15*$G$16=3.129670701
Finally, find the chi-square statistic. χ2 = Σ [ ( Observed frequency – Exp frequency) 2 / Exp frequency]
  • K11 = (G11-J11)2/J11 = 0.522331777
  • Similarly, K12 =  (G12 – J12)2 / J12 = 1.871731198
  • K13=(G13-J13)2/J13=0.056699325
  • K14=(G14-J14)2/J14 = 0.865071869
  • K15=(G15-J15)2/J15 = 0.242029645
  • Then, find the sum of Chi-Square values = K16= sum(K11:K15) =3.55786
  • Following that logic, the Degrees of freedom = No of categories -3 =5-3 =2
  • Consequently, the Critical value =5.991

Conclusion: The chi-square value of 3.557 is less than the critical value of 5.991. In hypothesis testing, a critical value is a point on the test distribution that is compared to the test statistic to determine whether to reject the null hypothesis. if the chi-square calculated was smaller than the critical value, then the data did fit the model, therefore, failed to reject the null hypothesis. So, students’ weights in ABC school follow a normal distribution.

Download Chi Square Test of Normality Exemplar

Unlock Additional Members-only Content!

To unlock additional content, please upgrade now to a full membership.
Upgrade to a Full Membership
If you are a member, you can log in here.

Chi Square Normality Testing

Authors

Comments (4)

I think there is an error in the DOF calculation in the Chi Square Normality Test “Degrees of freedom = No of categories -3 =5-3 =2”.

DOF = (rows – 1) x (cols – 1)
DOF = (5-1) x (2 – 1)
DOF = 4 x 1
DOF = 4

Hello Greg Tilson,
If mean and standard deviation is not given, DF for chi-square normality test = No of categories -3

Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.