Sampling is a data collection technique that is used when you want to create a statistically-sound conclusion from a subset of a population of data. In a DMAIC sense this is most common in the Measure phase.

## Why Use Data Sampling?

Sometimes trying to gather information on a complete population is just cost prohibitive. Think about CNN’s coverage of an election cycle in the United States. It is not possible to ask every voter how they voted. Even if it were, not all would answer. Instead they use exit polls to derive statistical conclusions about the population as a whole.

## Concerns About Sampling

When taking a sample from a larger population you must make sure that the samples are an appropriate size and are sampled without bias.

For example, it is very helpful if the sample size is large enough for the data to follow normal distribution as this opens the door to use an array of statistical tools.

## How Large Should a Data Sample Be?

### The calculation for how large a sample data set should be depends on:

- The type of data (continuous or discrete) being measured
- How precise you want your statistical inferences to be.
- The estimate of the standard deviation for the entire population.
- The confidence level desired.

### Sample size needed for hypothesis testing depends on:

- Desired Risk (Both alpha and beta)
- Minimum value to be detected between the population means (u – u0)
- The variation in the characteristic being measured (S or sigma) – the population variance.
- Even parameter shift sensitivity
- (Population size does NOT come into the determination of how big a population is.)

### Variable Data Sample Size

n = Z^2 * σ^2 / E^2.

Where n is the sample size, Z is the Z score from the desired risk, sigma is the standard deviation and E is the mean shift – or error.

### Binomial Data Sample Size

n = Z^2 (p bar) (1-p bar) / (Δp)^2

Where p bar is the proportion rate, Δp is the desired proportion interval.

## Types of Sampling Techniques

It is important to chose the best plan for sampling.

Sampling plans for inspection & auditing consider validity, applicability, and known risks.

Here are a few possibilities:

### Process Sampling

Samples can be taken from a population or a process. If taking from a process, be sure to preserve the time order.

### Random Sampling

Just choose at random so each data point has an equal chance of selection.

### Stratified Random Sampling

Divide the population into groups and then take an equal percentage of each group as a sample.

Ex. If a vat is suspected of not being homogeneous.

Ex. Poll x% voters in each age range.

Ex. Hypothesis that Cargo containers stacked at the end are disproportionately more likely to be damaged should be tested with the stratified method. Others could be used, but this is quicker and easier.

### Systematic Sampling

Choose every N # units. Ex. every 3rd person going through the airport screening process gets chosen for a pat down.

### Subgroup Sampling

Use a regular time period to take n # of samples. Ex measure the chlorine in a pool 3 times every hour and then use the average value. (also see rational subgroups.)

### Sequential Sampling

- Often used in auditing.
- Products coming from a production stream.

### Discovery Sampling

- Often used in auditing.

### Skip-lot Sampling

- Products coming from a production stream.

**Sample Variance:** For a set of data, the average squared deviation from the mean, with a denominator of n-1

## Sampling from a controlled process:

- Ranges of the samples should vary.
- Means of the samples should be slightly different but be in accordance to the process average and center on some central value.