Data collection is a science, not an art form. Be sure to use rigor and thought before assembling the data that you seek to analyze.

## Ensuring Data Accuracy & Integrity

- Data should not be removed from a set without an appropriate statistical test or logic.
- Generally, data should be recorded in time sequence.
- Unnecessary rounding should be avoided.
- If done, should be late in the process.

- Screen the data to remove entry errors.
- Avoid emotional bias.
- Record measurements of items that change over time as quickly as possible after manufacture and again after the stabilization period.
- Each important classification identification should be recorded alongside the data. (Ex. Time, machine, operator, gage, lab, material, conditions, etc).

## Coding Data

Sometimes it is more efficient to code data by adding, subtracting, multiplying or dividing by a factor.

### Types of Data Coding

**Substitution**– ex. Replace 1/8ths of an inch with + / 1 deviations from center in integers.**Truncation**– Ex. data set of 0.5541, 0.5542, 0.5547 – you might just remove the 0.554 portions.

### Problems due to NOT Coding

- practitioner tries to squeeze too many #s on a form – poor usability & legibility.
- increased errors in data entry.
- insensitivity of analytics due to rounding

### Effects of Coding Data

- Will affect the mean to the extent that the mean must be uncoded for reporting purposes.
- Coding and uncoding of the mean will be exactly opposite. (Ex. Add X, subtract X or multiply by X, divide by X.)
- The effect the coding has on standard deviation depends on how the data is coded.

## Rounding Data

- Rounding data will affect the standard deviation but not the mean – if done properly.

## Also See:

Data Sampling Techniques & Uses

## Data Collection Reading List

Quality Control and Industrial Statistics. Fifth Edition A very good and comprehensive reviews of Acceptance Sampling, Control Chart and Statistics.

## ASQ Six Sigma Black Belt Exam Data Collection Questions

**Question:** An important aspect of data collection is that the data collector should

(A) determine the dispersion of the data

(B) know how the data are to be used

(C) use a control chart to analyze the data

(D) use a stratified sampling plan

**Answer:** (B) Know how the data are to be used. (See Data Collection) If you are not sure how you are going to use the data, how would you know if you are collecting the right sets or if you are collecting enough of them or in the right manner?

You won’t know about dispersion or be able to create a control chart until after the data is collected, so those are poor choices.

While your sampling methods are important, often what you are going to do with your data dictates how you sample it.

**Question: **A method that changes data without significantly reducing accuracy or precision is known as

(A) bias adjustment

(B) statistical efficiency

(C) blocking

(D) coding

**Answer:**(d) coding.

Having to make a bias adjustment might improve your accuracy if you get it correct, but may not necessarily be precise.

Statistical efficiency makes no sense.

Blocking is a different technique for Design of Experiments.

## ASQ Six Sigma Green Belt Exam Data Collection Questions

**Question**: When the sampling method used creates a difference between the result obtained from the sample and the actual population value, the difference is known as

(A) correlation

(B) precision

(C) accuracy

(D) bias

**Answer**: Bias. I flat out do not like this question. In many reference books they list accuracy and bias as the essentially same thing. When in strict definition they are 2 sides of the same coin.

We know that correlation is a poor answer by definition. Also, Precision generally refers to getting consistent results repeatedly or the repeatability of the gage (See Repeatability and Reproducibility).

So how to choose between accuracy and bias in terms of data collection? Accuracy is achieved when unbiased true values are obtained. Here we are stating that you are only accurate if you are completely true. Any amount off of true is referred to as the bias. Since this question asks about the difference of the population from the sample, they are asking about the bias.

Again, I don’t like this question as it is almost splitting hairs. But it is important to know this concept and in practice account for it in your data collection plan and data collection form.