Data collection is a science, not an art form. Be sure to use rigor and thought before assembling the data that you seek to analyze.
Ensuring Data Accuracy & Integrity
- Data should not be removed from a set without an appropriate statistical test or logic.
- Generally, data should be recorded in time sequence.
- Unnecessary rounding should be avoided.
- If done, should be late in the process.
- Screen the data to remove entry errors.
- Avoid emotional bias.
- Record measurements of items that change over time as quickly as possible after manufacture and again after the stabilization period.
- Each important classification identification should be recorded alongside the data. (Ex. Time, machine, operator, gage, lab, material, conditions, etc).
- Also see data sampling.
Sometimes it is more efficient to code data by adding, subtracting, multiplying or dividing by a factor.
Types of Data Coding
- Substitution – ex. Replace 1/8ths of an inch with + / 1 deviations from center in integers.
- Truncation– Ex. data set of 0.5541, 0.5542, 0.5547 – you might just remove the 0.554 portions.
Problems due to NOT Coding
- practitioner tries to squeeze too many #s on a form – poor usability & legibility.
- increased errors in data entry.
- insensitivity of analytics due to rounding
Effects of Coding Data
- Will affect the mean to the extent that the mean must be uncoded for reporting purposes.
- Coding and uncoding of the mean will be exactly opposite. (Ex. Add X, subtract X or multiply by X, divide by X.)
- The effect the coding has on standard deviation depends on how the data is coded.
- Rounding data will affect the standard deviation but not the mean – if done properly.
Data Collection Reading List
Quality Control and Industrial Statistics. Fifth Edition A very good and comprehensive reviews of Acceptance Sampling, Control Chart and Statistics.
ASQ Six Sigma Black Belt Exam Data Collection Questions
Question: An important aspect of data collection is that the data collector should
(A) determine the dispersion of the data
(B) know how the data are to be used
(C) use a control chart to analyze the data
(D) use a stratified sampling plan
Question: A method that changes data without significantly reducing accuracy or precision is known as
ASQ Six Sigma Green Belt Exam Data Collection Questions
Question: When the sampling method used creates a difference between the result obtained from the sample and the actual population value, the difference is known as