When we use a sample group to gain insight into an entire population – whether we’re talking people or a product built in a factory – we risk the sample group not completely reflecting the whole population. Therefore, we need confidence intervals.

Further, we use confidence intervals to calculate a degree of certainty that the sample group accurately represents the entire population from which they were drawn. Another way of thinking of it is that if you drew the same sized sample group hundreds of times and performed the same measurements, a certain percentage of confidence intervals in those sample groups will contain the population mean.

A confidence interval is a range of values. So, you can have a percentage of certainty that the mean of the population lies within that range in any given sample from that population.

## Confidence Interval vs Confidence Level

A confidence *interval* is a range of values that probably contain the population mean.

However, a confidence *level* is a percentage of certainty that in any given sample, that confidence interval will contain the population mean.

## Confidence Interval vs Prediction Interval

Prediction intervals are used to calculate the next probable data point. They tell you about the distribution of data values, whereas confidence intervals tell you about a probable population mean.

## Estimating Population Mean and Standard Deviation

In most cases, we won’t know the mean and standard deviation for an entire population. However, we can calculate the mean and standard deviation for each of our samples. So we can use the sample figures to estimate those of the whole population. This is what we call a *point estimate*.

Once we find the point estimate, we also need to know how accurate it is. For instance , the Central Limit Theorem says that in a large enough sample group (30 or more), the sample’s mean and standard deviation will be close to those of the population.

## Margin of Error

Margin of error is the maximum expected difference between the actual population parameter and a sample estimate of the parameter. In other words, it is the range of values above and below sample statistics.

Margin of error widely used in surveys tells the degree of uncertainty that the survey results might have.

Margin of error = 1/√n

Where n= sample size

Margin of error is the product of critical value and the standard error in the confidence interval. If margin of error increases, confidence level increases. Similarly, margin of error decreases, confidence level also decreases.

For example, to estimate confidence interval, we may use X̅±E

E is the margin of error

Confidence interval = X̅±Zα/2*σ/√n

- Where σ= population standard deviation
- Zα/2= normal distribution value for the desired confidence level
- n= sample size
- X̅= sample mean

## Z Scores

A Z score is the number of standard deviations between a data point and its mean. Thus, you can use a Z-score table to find the corresponding Z-score for common confidence levels, or calculate the α value using this formula:

α = 1 - confidence level

If your desired confidence level is 95%, then your calculation would look like this:

α = (1 - 0.95) α/2 = (1 - 0.95) / 2 = 0.025 Zα/2 = Z0.025 =1.96

### Common confidence intervals and corresponding Z scores

Desired Confidence Interval | Z Score |

90% | 1.645 |

95% | 1.96 |

99% | 2.576 |

## Confidence Interval Question Using Z-Score

**Question:** We conduct a random survey of 500 newly-enrolled university students. We know that the standard deviation for university enrollment age is 8 years. The mean age of our sample is 24. Calculate, with 99% confidence level and to 3 decimal places, the confidence interval for all first-year university students.

**Calculation:** The first step is to consult a Z-score table. A 99% confidence level requires a Z-score of 2.576.

Z_{α/2}=2.576, X̅=24, **σ**=8 and n=500

Margin of error E = Z_{α/2}*σ/√n = 2.576*8/√800 = 0.922

Confidence interval = X̅±Z_{α/2}*σ/√n = 24±0.922

**Answer:** Hence, the confidence interval for the age of first-year university students is 23.07–24.922, with a 99% confidence level.

## Confidence Interval Question Using T-Score

**Question:** A factory produces tennis balls. A sample of 19 balls is taken from one days’ production in the factory. The mean weight of the sample balls is 58.2g. The standard deviation for the sample balls is 0.4g. Calculate the confidence interval with a confidence level of 95%.

**Calculation: **The sample size is too small to use a Z-score. Instead, use a T-score, which uses a t-distribution. Finding a confidence interval for a mean is a two-tailed test.

You’ll need an alpha score. To calculate it, use this simple equation:

α = (100% - confidence level%) α = (100% - 95%) α = 5%

You also need the degrees of freedom (df), which is the number of samples minus one. Or in equation form:

df = n - 1

df for this question is 18.

So, use a T-table to look up the T-score needed for a two-tailed test with an α of 5% and a df of 18: the answer is `2.101`

.

t_{α/2}=2.101, X̅=58.2, **σ**=0.4 and n=19

Margin of error E =t_{α/2}*σ/√n = 2.101*0.4/√19 = 0.192

Confidence interval = X̅±t_{α/2}*σ/√n = 58.2±0.192

**Answer: **Hence, 95% confidence interval for tennis balls produced in the factory is 58.008–58.192g.

## Calculating the Confidence Interval Comparing Two Population Means

You can also use confidence intervals to compare two population means, using samples from each population. Use this method to compare two different manufacturing methods, or to look for differences in two groups of people (for example, smokers and non-smokers). You could also use it to decide whether or not it will be acceptable to pool your two population samples into one larger sample.

The confidence interval for a comparison between two means is a range of values in which the *difference* between those two means might lie.

We use a similar equation to the one we use to calculate a population mean. Instead of looking for x, though, we’re looking for the difference between means.

Z = ((x̄_{1}- x̄_{2}) – (x_{1}– x_{2})) / √((s_{1}^{2}/ n_{1}) + (s_{2}^{2}/ n_{2}))

Or, in a slightly easier-to-read format:

### Example

High blood pressure has been causally linked to smoking tobacco products. To test this, you want to compare systolic blood pressure between smokers and non smokers. You’ll use a confidence level of 95%.

- You have 45 smokers and 56 non-smokers, with similar variance (age, gender, health levels) in each group.
- In the sample group of smokers, the mean systolic rate is 138.
- In the sample group of non smokers, the mean systolic rate is 135.
- The standard deviation for smokers is 16.5.
- The standard deviation for non smokers is 14.9.
- The Z-score you need is 1.96.

n_{1}= 45 n_{2}= 56 Z = 1.96 (found by looking up 95% confidence level on the chart.) s_{1}= 16.5 s_{2}= 14.9 x̄_{1}= 138 x̄_{2}= 135 Z = ((x̄_{1}- x̄_{2}) – (x_{1}– x_{2})) / √((s_{1}^{2}/n1 ) + (s_{2}^{2}/n2 ))

We use a similar equation to the one we use to calculate a population mean. Instead of looking for x, though, we’re looking for the difference between means.

Firstly, plug the numbers into the equation:

Z = ((x̄_{1}- x̄_{2}) – (x_{1}– x_{2})) / √((s_{1}^{2}/ n_{1}) + (s_{2}^{2}/ n_{2}))

1.96 = ((138 - 135) – (x_{1}– x_{2})) / √((16.5^{2}/ 45) + (14.9^{2}/ 56)) 1.96 = (3 – (x_{1}– x_{2})) / √((272.25 / 45) + (222.01 / 56)) 1.96 = (3 – (x_{1}– x_{2})) / √(6.05 + 3.9645) 1.96 = (3 – (x_{1}– x_{2})) / √(10.0145) 1.96 = (3 – (x_{1}– x_{2})) / (3.1646) 1.96 * (3.1646) = 3 – (x_{1}– x_{2}) 6.2025 = 3 – (x_{1}– x_{2}) 6.2025 + (x_{1}– x_{2}) = 3 (x_{1}– x_{2}) = 3 - 6.2025 (x_{1}– x_{2}) = -3.2025

To find the margin of error, use:

margin of error = Z * √((s_{1}^{2}/ n_{1}) + (s_{2}^{2}/ n_{2})) margin of error = 1.96 * √((16.5^{2}/ 45) + (14.9^{2}/ 56)) margin of error = 1.96 * √((272.25 / 45) + (222.01 / 56)) margin of error = 1.96 * √(6.05 + 3.9645) margin of error = 6.2025

So our confidence interval, with a confidence level of 95%, is:

confidence interval = difference in means ± margin of error confidence interval = -3.2025 ± 6.2025 confidence interval = -9.405–3

## Additional Confidence Intervals Videos

## Confidence intervals for Variation

The point estimate for σ is s. s^{2} is the most unbiased estimate of σ^{2}.

Use the chi-square distribution to construct a confidence interval for the variance and standard deviation

If the random variable x has the normal distribution, so the distribution of

For sample size n>1

Confidence intervals for variation

Where n= sample size

s^{2} = point estimate of variance

**Example:** XYZ pharmaceutical company randomly selected 25 samples of flu medicines. The sample variance is 6 milligrams. For instance, assume the weights are normally distributed, construct 99% confidence interval for the population variance.

n=25

Degrees of freedom = n-1=24

s^{2} =6

chi square α/2 = 45.55

chi square 1-α/2 = 9.886

Confidence intervals for variation

Hence, with 99% confidence, you can say that the population variance is between 3.161 and 14.56 milligrams.

## Confidence intervals for Proportion

Central theorem says that with larger samples every sample proportions will have a normal distribution. For larger sample sizes, sample size times proportion (np) and n (1-p) great than equal to 5, the normal distribution can be used to calculate the confidence interval for proportion.

The confidence intervals for proportion is

Where

n=sample size

p= population proportion estimate

Z_{α/2} = appropriate confidence level from Z table

**Example:** In a bulb manufacturing unit, 24 defective bulbs were identified in a sample size of 400 bulbs. Calculate the 95% confidence interval for the proportion.

n=400

P = 24/400 = 0.06

1-p = 0.94

Z_{α/2 }= 1.96 (95% confidence)

0.06±0.0232

0.036≤p≤0.083

So, with 95% confidence, the population proportion is between 0.036 and 0.083.

## What are the Difference Between Control Limits and Confidence Intervals?

These are 2 entirely different concepts. One is used in the Analysis of a process and the other in Control of a process.

Further, Control limits depend on your population or sample’s distribution. They can be defined as the average + or – 3 standard deviations. “Control limits are obtained based on the nature of the distribution of data that you collect, if a process is in control doesn’t mean that your process is stable, hence control limits gives the limits at that instant.”

These are different from Specification limits which generally show up on control charts. They are assigned by business for what is viable for them. A colleague once described them as if the process goes above this level, we all must update our resumes. So, if it goes below this other limit, don’t worry about updating resumes, no one will ever hire us again!

#### Confidence Interval

Confidence intervals are a device of statistics for when you do not have perfect knowledge of all of the data. For example, imagine you are trying to infer the chance of some event happening by sampling from a population. Let’s take US voting polls. CNN can’t possibly get all of the voting data, but they can sample the population through exit polls. From that sample they can predict the winner. So, the question is, how sure and certain are they of how accurate their answer is? Are they 90% certain? Are they 95% certain? 99.5%? It all depends on the confidence level required. Thus, you get answers like “We are projecting that candidate X has an 80% chance of winning with a 95% level of confidence.”

“While confidence level signifies how confident you are that the population lies within the range one has specified and this range is nothing but the confidence interval, and this has nothing to do with the control limits as control limits keeps changing if a process is not in stable though its in control. “

## Also See:

How to Calculate a Sample Size Given Standard Deviation, Confidence Interval and Margin of Error

## Other Confidence Intervals Problems

Also, you can find some more confidence interval problems, with links to worked answers, here: Finding the Sample Size Needed for a Confidence Interval for a Single Population Mean.

## ASQ Six Sigma Black Belt Exam Confidence Intervals Questions

**Question: **Which of the following describes the 95% confidence interval of a 20% absentee rate in a department with 30 people?

(A) 6% to 34%

(B) 8% to 32%

(C) 13% to 27%

(D) 17% to 23%

**Answer:** A 6% to 34%.

This is a confidence intervals for proportion question. p + or – Z (α/2) * SQRT( (p*(1-p))/n )

- p = 0.2
- α = 5% (Use this to look up the Z Score on the Z table.)
- n = 30

p + or - Z (α/2) * SQRT( (p*(1-p))/n ) 0.2 + or - Z (5%) * SQRT( (0.2*(1-0.2))/30 ) 0.2 + or -1.96 * SQRT( 0.0053) 0.2 + or - 0.1431 0.2 + 0.1431 = 0.3431 => 34.31% => Round down to 34% 0.2 - 0.1431 = 0.05686 => 5.68% => Round up to 6% Hence, it is between 6 & 34[/membership]

## Comments (11)

I don’t see a formula displayed for margin of error. In the sample workthroughs, its just assumed that a person working through it would know what value was the margin of error and how to appropriately apply it.

Alex,

I’ll see what I can add here. Remember that these materials should be supplementary to previous Six Sigma training.

Best, Ted

1.96 corresponds alpha =0.025 , not 0.05, the calculation has a typo.

Also the formula here is very different from ASQ CSSGB handbook P 287.

Z= (xbar -miu) sqrt n/ sigma

xbar=sample mean

miu=population mean

n-sample size

sigma=population standard deviation

This is getting very confusing

Hello Ming,

We have updated the article with the correct alpha value, and also the confidence interval formula.

Thanks

I posted a question, but disappear?

Hi Ming,

Non-members need to have their comments approved by an administrator. So, you may see some delay on comment approval until you join the members program.

Best, Ted.

Are the confidence intervals for variation correct? You use (a/2) for the lower, and (1-a/2) for the upper limit.

According to the Handbook Of Parametric And Nonparametric Statistical Procedures by David J. Sheskin (5th edition, page 217) it should be the other way around: (1-a/2) for the lower, and (a/2) for the upper limit.

Best regards, Daniel

Kind of weird to answer my own question, but here we go 🙂

I found out that Sheskin meant the right-tailed probability of the chi-squared distribution when he wrote “X²”. Is there any naming convention in statistics, which defines which tailed probability one assumes when writing “X²” for Chi-squared?

My confusion was actually caused by Excel, which has an old CHIINV and a new CHISQ.INV function. The old CHIINV function (up to Excel 2007) only returns the right-tailed probability. Using this CHIINV function, one must indeed use (a/2) for the lower, and (1-a/2) for the upper limit.

When I used the formulas on this webpage in Excel 365, utilising the CHISQ.INV function, I got unexpected results. To get correct results with the CHISQ.INV function, I had to use (1-a/2) for the lower, and (a/2) for the upper limit, as described in Sheskin’s book. The reason behind this switch lies in the definition of the CHISQ.INV function as the inverse of the left-tailed probability. The right-tailed probability can be calculated by adding “.RT” to the CHISQ.INV function (CHISQ.INV.RT).

I hope this explains my confusion and helps other to avoid mistakes when doing statistics with Excel 🙂

I appreciate you sharing both your original question and what you found out! Much appreciated!

Good morning Ted. Looks like the math addition is incorrect for the upper end of interval . the lower end figure is correct at 58.008 but the upper end needs to be the mean plus .192 which is 58.332. thanks Ted.

Confidence interval = X̅±tα/2*σ/√n = 58.2±0.192

Answer: Hence, 95% confidence interval for tennis balls produced in the factory is 58.008–58.192g.

The above comment was for the section called “Confidence Interval Question Using T-Score” with the 19 tennis balls.