Basic six sigma statistics is the foundation for six sigma projects. It allows us to numerically describe the data that characterizes the process Xs and Ys. Today statistics is an integral part of any organization day to day activities. Data and numbers play a vital role in six sigma projects; hence six sigma professionals and other stakeholders must have basic six sigma statistics knowledge.
Data is a set of values of qualitative or quantitative variables. It may be numbers, measurements, observations or even just descriptions of things.
Qualitative data: Qualitative data also known as non-numeric data is basically discrete data. It consists of a finite number of possible categories into which each observation may fall.
Types of Qualitative data
- Nominal data: A type of data the is descriptive with more than two categories—it about or referring to the names or labels. For example, Hair color- Black, brown. Gender -Male or female.
- Ordinal data: Ordinal data provides good information about the order of choices. In other words, arrange information in a particular order without indicating a specific relationship between items. Example: pass/fail, customer service-good or bad etc.
Quantitative data: It is also known as numerical data. The observations are counts or measurements. Unlike Qualitative data, it consists of an infinite number of possible categories into which each observation may fall. The quantitative data further divided into
- Discrete data: The data is discrete if the measurements are integers or counts. For example, Number of customer complaints, weekly defects data etc.
- Continuous data: The data is continuous if the measurement takes on any value, usually within some range. For example, Stack height, distance, cycle time etc.
Basic Types of Statistics used in Six Sigma
Statistics consists of principles and methodologies for collecting, analyzing, interpreting, and presenting data in a meaningful way. Statistics helps to understand the data behavior and identify improvement opportunities and predict future process performance.
Types of statistical analysis
Values that describe the characteristics of the sample or population. In other words, it provides simple summaries about the sample and the measures.
Types of Descriptive statistics
Refers to the statistical measure used to determine the center of a distribution of a data set. Based on the situation, the measure of central tendency could either by Mean, Median or Mode.
- Mean: The mean is the total of all data values divide by the number of data points.
- Median: The median is the middle value when the data arranged in an ascending order or descending order. If the data set having even values, the median is the average of the middle two values.
- Mode: The mode is the value that occurs most frequently in the data set.
Measure of dispersion
Dispersion is the degree of variation in the data. Dispersion measures the extent to which different items tend to disperse away from the central tendency.
Different types of measure of dispersion
Range: Range is the difference between the maximum and the minimum value.
Variance: Variance measures the dispersion of a set of data points around their mean value.
Standard Deviation: Standard deviation is the most popular measure of dispersion. It is used to measure the amount of variation in a process. Standard Deviation is one of the most common measures of variability in a data set or population.
Kurtosis: Kurtosis is a statistical measure to determine whether the data are heavy-tailed or light-tiled relative to a normal distribution. In other words, Kurtosis is a measure of the thickness of the tails of a distribution.
Inferential statistics used to draw conclusions or inferences about the characteristics of a population-based on data from a sample. In other words, using probability make inferences about a population parameter from information contained in a sample. T-test, regression analysis, and Analysis of variance (ANOVA) are a few examples of Inferential statistics.
Shape of the distribution
The shape of data distribution depicted by its number of peaks and symmetry possession, skewness, or uniformity. Skewness is a measure of the lack of symmetry. In other words, skewness is the measure of how much the probability distribution of a random variable deviates from the Normal Distribution.
Symmetrical Distribution: Generally, symmetrical distribution appears as a bell curve. The perfect normal distribution is the probability distribution that has zero skewness. Symmetrical distribution occurs when mean, median, and mode occur at the same point, and the values of variables occur at regular frequencies.
Positively Skewed Distribution: A distribution is said to be skewed to the right if it has a long tail that trails toward the right side. The skewness value of a positively skewed distribution is greater than zero.
Negatively Skewed Distribution: A distribution is said to be skewed to the left if it has a long tail that trails toward the left side. The skewness value of a negatively skewed distribution is less than zero.
Summarizing the data
Graphical analysis is one of the best ways to summarize the data in Six Sigma projects. The graphical analysis creates pictures of the data, which will help understand the patterns and the correlation between process parameters. Often graphical analysis is the starting point for any problem-solving method.
Different graphical analysis methods
Box-and-Whisker plot also known as Box and Whisker plot, is a pictorial representation of continuous data. Box plot shows the Max, Min, median, interquartile range Q1, Q3, and outlier.
Run chart also known as time series plot. It is a line graph of data plotted over time. It helps to identify the pattern of the data in the time series. Because they don’t use control limits, we cannot judge the process is stable or not.
Histogram is the graphical representation of a frequency distribution. It is in the form of a rectangle with class interval as bases and the corresponding frequencies as heights. Particularly, there is no gap between any two successive rectangles.
Pareto Chart also known as 80-20 rule. It is a combination of bar chart and a line chart. The actual data in descending order using bar chart and cumulative data in ascending order using a line graph.
Helpful Basic Six Sigma Statistics Videos
Six Sigma Symbols
µ: the central tendency statistic for populations
XBar: a point estimate for the population mean
σ: the actual population standard deviation / symbol for the measurement of dispersion in a population
N is for populations
n: The statistic for number of data in a sample
x: the individual value
Basic Six Sigma Statistics References
- The Cartoon Guide to Statistics, by Larry Gonick
- Purchased by Nishimura lab
- OpenIntro Statistics, by David Diez
- PDF available for free download
- Second edition purchased by Nishimura Lab
- Statistics and Probability, by Khan Academy
- Website with lessons and tutorials
- Statistics for Biologists Collection, by Nature Publishing Group
- A series of articles chronicling statistical issues Biologists may face
- Stat Trek
- Website for statistics training and tools
- Think Stats, by Allen B. Downey
- Free PDF
- Introduction to probability and statistics for Python programmers
- Includes code examples and exercises in accompanying Github repository
- A biologist’s guide to statistical thinking and analysis, by David Fay and Ken Gerow
- PDF of WormBook chapter co-authored by fellow Front Range C. elegans researcher David Fay
- Explain XKCD
- A wiki-style database of statistics-related XKCD comics
- Includes explanations of the statistical concept referenced in the comic
- Statquest Youtube Videos: