What is the F Distribution
The F-distribution, also known Fisher-Snedecor distribution is extensively used to test for equality of variances from two normal populations. F-distribution got its name after R.A. Fisher who initially developed this concept in 1920s. It is a probability distribution of an F-statistic.
The F-distribution is generally a skewed distribution and also related to a chi-squared distribution. The f distribution is the ratio of X1 random chi-square variable with degrees of freedom ϑ1 and X2 random chi-square variable with degrees of freedom ϑ2. (In other words each of the chi-square random variable has been divided by its degrees of freedom)
The shape of the distribution depends on the degrees of freedom of numerator ϑ1 and denominator ϑ2.
What are the properties of an F Distribution?
- F distribution curve is positively skewed towards right with range of 0 and ∞
- The value of F always positive or zero. No negative values
- The shape of the distribution depends on the degrees of freedom of numerator ϑ1 and denominator ϑ2.
- From the above graph it is clear that degree of skewness decreased with increase of degrees of freedom of numerator and denominator
- F distribution curve never be symmetrical, if degrees of freedom increases it will be more similar to the symmetrical
When would you use the F Distribution?
The F-test compares the more than one level of independent variable with multiple groups which uses the F distribution. This is generally used in ANOVA calculations. Always use F-distribution for F-test to compare more than two groups.
Example: In a manufacturing unit torque values are key parameters in terminals squeeze welding. To check the significant effect of various torque (nm) values of the squeeze welding, operator set up trails of 5nm, 8nm, 10nm and 12nm of terminals in four randomly selected batches of 30. ANOVA can determine whether the means of these 4 trails are different. ANOVA uses F-tests to statistically test the equality of means.
Assumptions of F distribution
- Assumes both populations are normally distributed
- Both the populations are independent to each other
- The larger sample variance always goes in the numerator to make the right tailed test, and the right tailed tests are always easy to calculate.
F distribution Videos
What is an F Test?
F test is to find out whether the two independent estimates of population variance differ significantly. In this case F ratio is
To find out whether the two samples drawn from the normal population having the same variance. In this case F ratio is
In both the cases σ12 > σ22 , S12 > S22 in other words larger estimate of variance always be in numerator and smaller estimate of variance in denominator
Degrees of freedom (ϑ)
- DF of larger variance (i.e numerator) =n1-1
- DF of smaller variance (i.e denominator) =n2-1
What is an F Statistic?
F statistic also known as F value is used in ANOVA and regression analysis to identify the means between two populations are significantly different or not. In other words F statistic is ratio of two variances (Variance is nothing but measure of dispersion, it tells how far the data is dispersed from the mean). F statistic accounts corresponding degrees of freedom to estimate the population variance.
F statistic is almost similar to t statistic. t-test states a single variable is statistically significant or not whereas F test states a group of variables are statistically significant or not.
F statistics are based on the ratio of mean squares. F statistic is the ratio of the mean square for treatment or between groups with the Mean Square for error or within groups.
F = MS Between / MS Within
If calculated F value is greater than the appropriate value of the F critical value (found in a table or provided in software), then the null hypothesis can be rejected. (helpful in ANOVA)
The calculated F-statistic for a known source of variation is found by dividing the mean square of the known source of variation by the mean square of the unknown source of variation.
When would you use an F Test?
There are different types of F tests are exists for different purpose.
- In statistics, an F-test of equality of variances is a test for the null hypothesis that two normal populations have the same variance.
- F-test is to test equality of several means. While ANOVA uses to test the equality of means.
- F-test for linear regression model is to tests any of the independent variables in a multiple linear regression are significant or not. It also indicates a linear relationship between dependent variable and at least one of the independent variable.
Steps to conduct F test
- Choose the test: Note down the independent variables and dependent variable and also assume the samples are normally distributed
- Calculate the F statistic, choose the highest variance in the numerator and lowest variance in the denominator with a degrees of freedom (n-1)
- Determine the statistical hypothesis
- State the level of significance
- Compute the critical F value from F table. (use α/2 for two tailed test)
- Calculate the test statistic
- Finally, draw the statistical conclusion. If Fcalc > Fcritical, reject the null hypothesis and if Fcalc< Fcritical fail to reject the null hypothesis
What is an Example of an F Test in DMAIC?
In Measure and Analyze phase of DMAIC. F test is to find out whether the two independent estimates of population variance differ significantly (or) to find out whether the two samples drawn from the normal population having the same variance.
Example: A botanical research team wants to study the growth of plants with the usage of urea. Team conducted 8 tests with a variance of 600 during initial state and after 6 months 6 tests were conducted with a variance of 400. The purpose of the experiment is to know is there any improvement in plant growth after 6 months at 95% confidence level
- Degrees of freedom ϑ1=8-1 =7 (highest variance in numerator)
- ϑ2 = 6-1= 5
- Statistical hypothesis:
- Null hypothesis H0: σ12≤ σ22
- Alternative hypothesis H1: σ12≥ σ22
- Since team wants to see the improvement it is a one-tail (right) test
- Level of significance α= 0.05
- Compute the critical F from table = 4.88
- Reject the null hypothesis if the calculated F value more than or equal to 4.88
- Calculate the F value F= S12/ S22 =600/400= 1.5
- Fcalc< Fcritical Hence fail to reject the null hypothesis
From F table we can find F critical values that gives us a certain area of to the right. From the above table the area to the right of 4.88 is 0.05 and area to the right of 3.37 is 0.100. So, the area to the right of 1.5 from the graph must be more than 0.100. But we can find exact p-value using any statistical tool or excel very easily.
- Statistical conclusion: So, calculated value does not lie in the critical region. Hence fail to reject the null hypothesis at 95% confidence level
F Test Sample Questions
In a manufacturing facility 2 Six Sigma Greenbelts monitoring part that runs on 2 different stamping presses. Each press runs the same progressive die. Student A says that he is 90% confident that the stamping presses have the same variance, while student B says at the 90% confidence level the variances are different. Which student is right? Press1: s = 0.035, n = 16 ; Press 2: s = 0.057, n = 10