Data collection is a science, not an art form. Be sure to use rigor and thought before seeking the data you wish to analyze.

Data Collection. Photo by Governo do Estadio
Photo by Governo do Estadio

Ensuring Data Accuracy & Integrity

Be sure to address these points in your data collection plan in addition to making use of a data collection form.

  • Firstly, data should not be removed from a set without an appropriate statistical test or logic.
  • Generally, data should be recorded in time sequence.
  • Unnecessary rounding should be avoided because it can skew data if done too early.
    • In any case, this should be late in the process if necessary.
  • Next, you will want to screen the data to remove entry errors.
  • Avoid emotional bias even if the stakes are high.
  • Then, record measurements of items that change over time as quickly as possible after manufacture and again after the stabilization period.
  • Each important classification identification should be recorded concurrently with the data. (For instance, time, machine, operator, gauge, lab, material, conditions, etc.).
  • Also, see data sampling.

Concerns for Data Collection

Data collection is one of the critical components of SPC, as it helps the Six Sigma team to identify when a process is operating outside the specification limits and take appropriate corrective action(s). However, there are several concerns to keep in mind when collecting data for SPC:

  • Data quality: The quality of the data collected may significantly impact the effectiveness of SPC. Hence, it is crucial to ensure that the data is collected using standardized methods and that the data is accurate, reliable, and consistent.
  • Relevant Data: Often, a wide range of data exists in the process. But for each data collection, the organization must provide resources, budget, and time. Hence it is necessary to understand what data is relevant for SPC and useful for business.
  • Data analysis: Apart from collecting data, it must also be analyzed to identify trends and patterns that can help improve the process. Therefore, it is essential to have the necessary expertise and efficient tools to analyze the data effectively.
  • Data storage: The storage of data collected for SPC should be carefully managed to ensure that it is secure and accessible only to authorized personnel. Also, consider the infrastructure to handle big data.

Coding Data

Another key point is that sometimes it is quicker to code data by adding, subtracting, multiplying, or dividing by a factor.

Types of Data Coding

  • Substitution – For example, replace 1/8ths of an inch with + / 1 deviations from the center in whole numbers.
  • Truncation – For example, in a data set of 0.5541, 0.5542, and 0.5547, you might remove the 0.554 portions.

Problems due to NOT Coding

  • The practitioner tries to squeeze too many #s on a form – leading to a form that is hard to use and read.
  • Data entry issues as a result of human error.
  • Poor analytics due to rounding

Effects of Coding Data

  • This will affect the mean to the extent that the mean must be accordingly decoded for reporting purposes.
  • In other words, coding and decoding the mean will be the opposite. (For example, add X, subtract X, or multiply by X, divide by X.)
  • Lastly, the effect the coding has on standard deviation depends on how the data is coded.

Rounding Data

  • If done properly, rounding data will affect the standard deviation but not the mean.

Tools to gather data

There are various systemic tools available for organizations to collect customer information; some of them include:


Surveys are a structured set of questions designed to collect data or opinions from a group of people. Usually, surveys are conducted online, on paper, or through interviews, etc. Surveys are flexible and can collect the same kind of information/questions from a large number of populations in a standard way.

Focus Groups:

Focus groups involve a small (typically 4 to 10), diverse group of individuals who share their opinions and feedback on a particular topic under the guidance of a facilitator. Usually, these discussions last for a time period of 1-2 hours. Often more in-depth than surveys and provide qualitative insights into the participants’ thoughts and experiences.

Face-to-Face Interviews:

Face-to-face interviews involve direct, in-person conversations between an interviewer and a participant. This method allows for a more personalized and detailed exploration of responses. It is usually lasting for 30 to 60 minutes.

Satisfaction/Complaint Cards:

Satisfaction and complaint cards are brief forms or cards that individuals can fill out to express their level of satisfaction or dissatisfaction with a product, service, or experience. These cards could function as feedback forms.

Competitive shoppers:

Competitive shoppers typically refer to individuals who actively evaluate a company and competitors’ data such as comparing prices, features, and offerings from different competitors.

Also, See:

Data Collection Reading List

Quality Control and Industrial Statistics. Fifth Edition, on the whole, is a very good and nearly complete review of Acceptance Sampling, Control Chart, and Statistics.

ASQ Six Sigma Black Belt Exam Data Collection Questions

Question: An important aspect of data collection is that the data collector should

(A) determine the dispersion of the data
(B) know how the data are to be used
(C) use a control chart to analyze the data
(D) use a stratified sampling plan


Unlock Additional Members-only Content!

To unlock additional content, please upgrade now to a full membership.
Upgrade to a Full Membership
If you are a member, you can log in here.

Thank You for being a Member!

Here’s some of the bonus content that is only available to you as a paying member.

Bias. I flat-out do not like this question. In many reference books, they list accuracy as well as bias as the essentially same thing. When in strict terms, they are two sides of the same coin.

We know that correlation is a poor answer due to its definition. Also, Precision generally refers to getting consistent results repeatedly or the repeatability of the gage (See Repeatability and Reproducibility).

So how to choose between accuracy and bias in terms of data collection? Accuracy is realized when true unbiased values are gathered. In this case, we are stating that you are only accurate if you are completely true. To clarify, any amount of truth is referred to as bias. Since this question asks about the difference of the population from the sample, they are asking about the bias.

Again, I don’t like this question because it is almost like splitting hairs. But it is important to know this concept and, in practice, account for it in your data collection plan and data collection form.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.