The Normal distribution is used to analyze data when there is an equally likely chance of being above or below the mean for continuous data whose histogram fits a bell curve. Statisticians refer to the normal curve as the Gaussian Probability distribution, named after Gauss.
Entertainingly, when students ask for a professor to grade on a curve, they probably don’t know that would mean 50% of the students would receive below a 50, or less than a D!
- Normal Distribution is the most widely known symmetric distribution for continuous data.
- Symmetrical distribution about the mean (bell-shaped curve)
- They will never be perfect unless you have an infinite data set.
- Commonly used in inferential statistics
- The most commonly used distribution in Six Sigma.
- Family of distributions characterized is by m and s
- The peak of the normal curve is an indication of the average, which is the center of process variation. An average of a group of numbers is an indication of the central tendency.
- If there is a normal curve, nothing is unduly influencing the process.
- Is symmetric
When to Use Normal Distribution
- When data is grouped around the mean and there is an equal probability of being above or below the mean (50% above & 50% below the average).
- If we can transform data to behave like a normal distribution, then do it! Much easier to work with data in this shape.
- Ex. If we have to take the log of values, or subtract a number, or perform some other operation on the data, then do it.
- Use when the histogram fits a bell curve.
- Use when the goodness-of-fit statistic is less than the selected P-value (usually 0.05).
- Normal distribution is used to test population means from sample data
- Use a histogram to determine if data are normally distributed.
- Probabilistic assessments of distribution of time between independent events occurring at a constant rate
- Shape can be used to describe failure rates that are constant as a function of usage.
- The standard normal or t-distributions are most likely used to compare two process means.
Formulas for Standard Normal Distribution
In a normal distribution 68% of the data will occur within +/- 1 standard deviation.
- e = constant (2.71828) – Poisson constant
- x = control variable – (data being studied)
- µ = population mean
- σ = population standard deviation
Formulas for Population mean, Variance, Standard Deviation.
N (capital N) refers to Population.
In this case, σ^2 is the variance.
Formulas for Sample mean, Variance, Standard Deviation.
n (lower case n) refers to sample size.
In this case, s^2 is the variance.
The population equations are different from the sample equations because we wish to reduce the “degrees of freedom” or increase our confidence in the sample.
Variation and Bell Shape
Adjusting to Center
You do not want to adjust an on-going process to “center it”. This increases variation. The more you do this the more the operator is unduly influencing the process and the less the distribution will be shaped like a bell. See Quincunx demonstration.
Center of Process Variation
The peak of the normal curve is an indication of the average, which is the center of process variation.
5Ms & 1 P
When you have a bell shaped curve, none of the 5 Ms or one P are unduly influencing the process.
- Is your process is following a normal distribution?
- How to transform data into a normal distribution.
- Process control for non-normal distributions.
Normal Distribution Videos
Good basic description of normal distribution
What is standard normal distribution?
ASQ Six Sigma Black Belt Certification Normal Distribution Questions:
Answer: A half sigma is 19.1%, 1 sigma is 19.1+ 15, and 2 sigmas are 19.1% + 15% + 9.2% + 4.4 % = 47.7%
Now double it to account for the other side of the median: 47.7% * 2 = 95.4%.
So, a process that follows the normal distribution inside can expect to have 95.4% of its values fall within 2 sigmas of the mean.[/membership]
ASQ Six Sigma Green Belt Certification Normal Distribution Questions:
Question: For a normal distribution, two standard deviations on each side of the mean would include what percentage of the total population?
Answer: A – 95%. For this question you need to remember that nearly all of a process’s outputs will be within 6 sigmas – or 6 standard deviations. 2 standard deviations on each side of the mean would be 95% of all out comes.[/membership]