Chi square goodness of fit test tells how well a categorical (nominal or ordinal) sample distribution fit into a hypothetical distribution. In other words, it is the test used to check is the sample data consistent with a hypothesized distribution of the population. Goodness of fit test is used to determine the observed sample distribution matches or fits the expected values; hence we use the term goodness of fit.

Goodness of fit is a non-parametric test because it does not rely on estimates of a population parameter like mean or variance to make an inference on the characteristics of the population.

## When to use Chi Square Goodness of Fit?

A chi-square

Reference.goodness-of-fit test canbe conducted when there isonecategoricalvariablewith more than two levels.Ifthere are exactly two categories, then aoneproportion ztestmay be conducted.

The chi-square goodness-of-fit test requires 2 assumptions

^{2,3}:1. independent observations;

2. for 2 categories, each expected frequency EiEi must be at least 5.

For 3+ categories, each EiEi must be at least 1 and no more than 20% of all EiEi may be smaller than 5.

Reference

## Structure of a Chi Square Goodness of Fit Test

The Goodness of fit tests are structured in cells; the observed frequency goes in each cell. The distribution you are trying to match would have a theoretical frequency. The Chi square is summed across all cells.

Use the data values structured into cells and requires a calculated chi-square test statistic. The unknown distribution is tested, the Degrees of Freedoms varies according to the distribution.

**GOF Distribution | Degrees of Freedom**

Normal | # cells – 3

Poisson | # cells – 2

Binomial | # cells – 2

Uniform | # cells – 1

**Steps to perform Chi Square goodness of fit**

Step1: Define the null hypothesis and alternative hypothesis

- Null hypothesis (H
_{0}): There is no difference between observed value and the expected value - Alternative hypothesis (H
_{1}): There is a significant difference between observed value and the expected value

Step 2: Specify the level of significance

Step 3: Compute χ2 statistic

- O is the observed value
- E is the expected value

Step 4: Calculate the degree of freedom:

The degrees of freedom in chi square test depends on the sample distribution

Step5: Find the critical value, based on degrees of freedom

Step 6: Finally, draw the statistical conclusion:

If the test statistic value is greater than the critical value, reject the null hypothesis, and hence we can conclude that there is a significant difference between observed value and expected value.

## Chi Square **goodness of fit test** Example 1: Did the Distribution Change?

Ten years ago, US airlines categorized the priority customers (those who completes 10,000 miles travelled in a year) based on the age

In 2020, 500 priority passengers are sampled, below are the results

At 95% confidence level, would you conclude that the population distribution of priority customers changed in the last 10 years?

- Null hypothesis (H
_{0}): The sample data meet the expected distribution. - Alternative hypothesis (H
_{1}): The sample data does not meet the expected distribution.

Level of significance: α=0.05

Degrees of freedom = number of categories (n)= 4

n-1 =3

Chi-square critical value for 3 degrees of freedom =7.815

The test statistic value is greater than the critical value, hence we can reject the null hypothesis.

So, we can conclude that the priority customers in 2020 are different than those expected based on 2010 population.

## Download Chi Square Goodness of Fit Exemplar

## Chi Square Normality Test

Many statistical techniques (regression, ANOVA, t-tests, etc.) rely on the assumption that data is normally distributed. Hence, the Chi-square goodness of fit test is one of the good options to check whether the data follows a normal distribution.

Furthermore, Chi-square goodness of fit test is an alternative to Anderson-Darling test, Kolmogorov-Smirnov (K-S) test, Shapiro-Wilk test to test the normality.

Similar to other statistical hypothesis tests, the Chi-square goodness of fit test also needs to compute the test statistics and find the critical value for a given degree of freedom and confidence level. If the Chi-Square value is greater than the critical value, reject the null hypothesis.

## How to conduct Chi Square goodness fit normality test

- Determine the null hypothesis, i.e. data is sampled from a normal distribution and alternative hypothesis.
- Compute the sample mean, standard deviation.
- Define confidence level
- Bin the data: Determine non-overlapping bins, and then count values in each bin
- Find the cumulative probability for each category endpoint.
- Compute the probability that a randomly selected value would go in each category
- Find expected observations each bin, which is the product of the probability of observation would fall in the bin to the sample size (n)
- Compute chi-square statistic. χ2 =
**Σ**[ ( Observed frequency – Exp frequency) 2 / Exp frequency] - Find the degrees of freedom based on the number of categories or bins. The degrees of freedom for Chi-square goodness of fit test is always the number of categories-1-2(two estimated parameters, mean and standard deviation) = k-3(note: valid if mean and standard deviation are not given)
- Find the critical value, based on degrees of freedom.
- Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value, reject the null hypothesis.

## Chi Square normality test example

**Example**: 28 students’ weights (in kgs) are collected in ABC school. Test whether the data is normally distributed with a confidence level 95%.

- H
_{0}: Students weights in an ABC school follows a normal distribution - H
_{1}:Students weights in an ABC school does not follow a normal distribution

##### Compute sample mean and standard deviation.

- Select all the data and enter” =Average(B2:H5)” for mean and “ Stdev(B2:H5)” for standard deviation
- Confidence level = 95%

##### Bin or categories the data and count the values

##### Find cumulative probability for each category end point.

- In cell H11 type NORMDIST (20, L2, L4, TRUE),
- Similarly H12 = NORMDIST (25, L2, L4, TRUE),
- H13 = NORMDIST (30, L2, L4, TRUE),
- H14= NORMDIST (35, L2, L4, TRUE),
- H15 = NORMDIST (∞, L2, L4, TRUE),

##### Compute probability that a randomly selected value would go in each category

- For I 11 cell , ie less than 20 copy the same value ie 0.118431892
- In cell I 12 i.e values between 25 to 20= H12-H11 = 0.350912244-0.118431892 = 0.232480352
- Similarly for I13 = H13-H12=0.310801442
- I14 = H14-H13 = 0.226512236
- I15 = H15-H14 = 0.111773954

##### Find expected observations each bin, which is the product of probability of observation would fall in the bin to the sample size (n) =28

- J11 = I11*$G$16 = 3.316092984
- Similarly, J12=I12*$G$16=6.509449848
- J13=I13*$G$16=8.702440383
- J14=I14*$G$16=6.342346084
- J15= I15*$G$16=3.129670701

##### Find the chi-square statistic. χ^{2} = **Σ** [ ( Observed frequency – Exp frequency)^{ 2} / Exp_{ }frequency]

- K11= (G11-J11)
^{2}/J11 = 0.522331777 - Similarly, K12 = (G12-J12)
^{2}/J12=1.871731198 - K13=(G13-J13)
^{2}/J13=0.056699325 - K14=(G14-J14)
^{2}/J14 = 0.865071869 - K15=(G15-J15)
^{2}/J15 = 0.242029645 - Find the sum of Chi-Square values = K16= sum(K11:K15) =3.55786

- Degrees of freedom = No of categories -3 =5-3 =2
- Critical value =5.991

Conclusion: Chi-square value 3.557 is less than the critical value of 5.991. In hypothesis testing, a critical value is a point on the test distribution that is compared to the test statistic to determine whether to reject the null hypothesis. if the chi-square calculated was smaller than the critical value, then the data did fit the model, therefore, failed to reject the null hypothesis. So, students weights in a ABC school follow a normal distribution.

## Comments (4)

I think there is an error in the DOF calculation in the Chi Square Normality Test “Degrees of freedom = No of categories -3 =5-3 =2”.

DOF = (rows – 1) x (cols – 1)

DOF = (5-1) x (2 – 1)

DOF = 4 x 1

DOF = 4

Hello Greg Tilson,

If mean and standard deviation is not given, DF for chi-square normality test = No of categories -3

Thanks

If i may, What is the formula if mean and Std Dev are given?

Hello Ashwin,

df = number of intervals – 1, since the mean and standard deviation are given

Thanks