Basic six sigma statistics is the foundation for six sigma projects. It allows us to numerically describe the data that characterizes the process Xs and Ys. Today statistics is an integral part of any organization day to day activities. Data and numbers play a vital role in six sigma projects; hence six sigma professionals and other stakeholders must have basic six sigma statistics knowledge.

## Data Types

Data is a set of values of qualitative or quantitative variables. It may be numbers, measurements, observations or even just descriptions of things.

**Qualitative data**: Qualitative data also known as non-numeric data is basically discrete data. It consists of a finite number of possible categories into which each observation may fall.

Types of Qualitative data

**Nominal data:**A type of data the is descriptive with more than two categories—it about or referring to the names or labels. For example, Hair color- Black, brown. Gender -Male or female.**Ordinal data:**Ordinal data provides good information about the order of choices. In other words, arrange information in a particular order without indicating a specific relationship between items. Example: pass/fail, customer service-good or bad etc.

**Quantitative data:** It is also known as numerical data. The observations are counts or measurements. Unlike Qualitative data, it consists of an infinite number of possible categories into which each observation may fall. The quantitative data further divided into

**Discrete data**: The data is discrete if the measurements are integers or counts. For example, Number of customer complaints, weekly defects data etc.**Continuous data**: The data is continuous if the measurement takes on any value, usually within some range. For example, Stack height, distance, cycle time etc.

## Types of Statistics

Statistics consists of principles and methodologies for collecting, analyzing, interpreting, and presenting data in a meaningful way. Statistics helps to understand the data behavior and identify improvement opportunities and predict future process performance.

Types of statistical analysis

**Descriptive statistics**

Values that describe the characteristics of the sample or population. In other words, it provides simple summaries about the sample and the measures.

Types of Descriptive statistics

### Central Tendency

Refers to the statistical measure used to determine the center of a distribution of a data set. Based on the situation, the measure of central tendency could either by Mean, Median or Mode.

**Mean:**The mean is the total of all data values divide by the number of data points.**Median:**The median is the middle value when the data arranged in an ascending order or descending order. If the data set having even values, the median is the average of the middle two values.**Mode:**The mode is the value that occurs most frequently in the data set.

### Measure of dispersion

Dispersion is the degree of variation in the data. Dispersion measures the extent to which different items tend to disperse away from the central tendency.

Different types of measure of dispersion

**Range: **Range is the difference between the maximum and the minimum value.

**Variance: **Variance measures the dispersion of a set of data points around their mean value.

**Standard Deviation:** Standard deviation is the most popular measure of dispersion. It is used to measure the amount of variation in a process. Standard Deviation is one of the most common measures of variability in a data set or population.

**Kurtosis: **Kurtosis is a statistical measure to determine whether the data are heavy-tailed or light-tiled relative to a normal distribution. In other words, Kurtosis is a measure of the thickness of the tails of a distribution.

**Inferential statistics:**

Inferential statistics used to draw conclusions or inferences about the characteristics of a population-based on data from a sample. In other words, using probability make inferences about a population parameter from information contained in a sample. T-test, regression analysis, and Analysis of variance (ANOVA) are a few examples of Inferential statistics.

**Shape of the distribution**

The shape of data distribution depicted by its number of peaks and symmetry possession, skewness, or uniformity. Skewness is a measure of the lack of symmetry. In other words, skewness is the measure of how much the probability distribution of a random variable deviates from the Normal Distribution.

**Symmetrical Distribution**: Generally, symmetrical distribution appears as a bell curve. The perfect normal distribution is the probability distribution that has zero skewness. Symmetrical distribution occurs when mean, median, and mode occur at the same point, and the values of variables occur at regular frequencies.

**Positively Skewed Distribution: **A distribution is said to be skewed to the right if it has a long tail that trails toward the right side. The skewness value of a positively skewed distribution is greater than zero.

**Negatively Skewed Distribution: **A distribution is said to be skewed to the left if it has a long tail that trails toward the left side. The skewness value of a negatively skewed distribution is less than zero.

## Summarizing the data

Graphical analysis is one of the best ways to summarize the data in Six Sigma projects. The graphical analysis creates pictures of the data, which will help understand the patterns and the correlation between process parameters. Often graphical analysis is the starting point for any problem-solving method.

Different graphical analysis methods

Box-and-Whisker plot also known as Box and Whisker plot, is a pictorial representation of continuous data. Box plot shows the Max, Min, median, interquartile range Q1, Q3, and outlier.

Run chart also known as time series plot. It is a line graph of data plotted over time. It helps to identify the pattern of the data in the time series. Because they don’t use control limits, we cannot judge the process is stable or not.

Histogram is the graphical representation of a frequency distribution. It is in the form of a rectangle with class interval as bases and the corresponding frequencies as heights. Particularly, there is no gap between any two successive rectangles.

Pareto Chart also known as 80-20 rule. It is a combination of bar chart and a line chart. The actual data in descending order using bar chart and cumulative data in ascending order using a line graph.