What is the Box Cox Transformation?
A Box Cox Transformation is a simple calculation that may help your data set follow a normal distribution. Box Cox transformation was first developed by two British statisticians namely George Box and Sir David Cox.
When the assumption of data normally distributed is violated or the relationship between the dependent and independent variables in case of linear model are not linear, in such situations some transformations methods that may help the data set follow a normal distribution. Box Cox is one such transformation method.
The basic assumption of Box-Cox is data must be positive (no negative values) and also data should be continuous.
What Does Box Cox have to do with Multiple Regression Analysis?
Box-Cox transformation is the basic tool in Multiple Regression Analysis. The assumption of any linear modes is that relationship between the response variable Y and the predictor variable X’s is linear, however this is the not true the all the times, so when the relationship between the dependent variable and independent variable is not linear and still wish to fit a linear model to the data then consider a Box-Cox transformation method. This will transform the predictor variable or the response variable and then fit a linear model to the data to study the effect that the predictor variable has on the transformed responses.
The basic assumption of linear models is that the error terms are normally distributed. Significant violation of the assumption also leads to committing the type I or type II error.
In addition, the benefits of Box-Cox transformation which includes less skewness, maintains linear relationship between response variable Y and the predictor variable X’s, almost equal spread etc.,
The Box Cox Equation
The original form of the Box- Cox transformation are given by
In 1964 paper, Box-Cox proposed an extended form of the two parameter Box-Cox transformation
When would you use this transformation during the DMAIC process?
Process capability studies are performed during Measure phase of DMAIC. The first step for process capability studies are to check where the data follows normal distribution or not (it is more important for parametric tests like ANOVA etc).
Box-Cox method helps to address non-normally distributed data by transforming to normalize the data. However there is no guarantee that data follows normality, because it does not really checks for normality.
The Box-Cox method checks whether the standard deviation is the smallest or not. Hence it is always advisable to check the transformed data for normality using probability plot or Q-Q(Quantile-Quantile) plot.
How to use Box Cox to calculate Process capability for non-normal data
There may be no advantage of calculating the process capability for non-normal raw data, in other words it may give inaccurate results. Data should be transformed to normalize before calculating the process capability. While there are various data transformation methods exists like log transformation, power transformation, Exponential, Reciprocal etc.,
In order to use the right transformation method some data analysis may be required. One of the foremost power transformation method is Box-Cox method.
The formula is yI = yLambda
Where Lambda power that must be determined to transform the data. The usual assumption of parameter Lambda values varies between -5 and 5. The likelihood of transformed data is maximum and data are normally distributed when the standard deviation value is small.
Most Common Box-Cox Transformations
Example: if the Lambda is 2 then yLambda = y2
An Example of a Box Cox Transformation by Hand
Box Cox transformations in practice are typically done by leveraging software that can try many different variations of Box Cox transforms very quickly.
Doing it by hand in practice is time-consuming and error prone. Imagine trying varying types of lamba by hand until you run them all or run out of patience!
“But what about on a Six Sigma exam?” I can hear you say. “I won’t have MiniTab or R Studio available! What will I do?”
Not to worry.
In my experience the questions on the exam are rather simple. You’re usually just having to do or understand the following:
- Sometimes your data doesn’t appear to be normal, but if you transform it, you can achieve normality – which then opens up a bunch of other properties and tools for you (or at least easier tools ;’)).
- While Box-Cox is complex, questions on Six Sigma exams are usually very simple. Just substitute variables into the following equation:
- X(transform) = X ^ Lambda
Example: if the Lambda is 2 then yI = y2
All you have to do is replace your original data with the “new equation” using a lamba of 2.
As the example chart here shows you, all you’d have to do is just take the original value and square it.
“Old measure” 2 now becomes “New measure” 4 because we are simply substituting into X(transform) = X ^ Lambda for the following: X(2) = 2 ^ 2.
An Example of a Box Cox Transformation Using MiniTab
An Example of a Box Cox Transformation Using MiniTab
Box Cox Transformation in Minitab tool, Excel Analysis toolpak or any other statistical software tools. These tools automatically calculates an appropriate power transformation
Example: Raw data
Step 1: Perform the normality test to see whether the data follows normal distribution or not
From the above graph P value is less than 0.005, hence the data does not follows normal distribution and from the histogram it clearly shows data skewed one side.
Step 2: Transform the data using Box Cox Transformation
Step 3: Again test the normality
From the above graph the p value is greater than 0.05, hence it is clear that data follows normal distribution and from the histogram also we can see the data uniformly distributed.
What Do You Need to Know for Your Six Sigma Exam?
The IASSC Six Sigma Green Belt BOK requires as part of the Improve Phase.
The IASSC Six Sigma Black Belt BOK requires as part of the Improve Phase.
The ASQ Six Sigma Black Belt BOK requires the following:
Process capability for non-normal data
Identify non-normal data and determine when it is appropriate to use Box-Cox or other transformation techniques. (Apply)
This first video has poor audio, but gives a good overview.
This second video shows a great practical example leveraging R studio. You’re unlikely to have to go into this level of detail on an exam. I include it because it’s a great example with very helpful plots of data that help you visualize what a transformation can do to help you progress through your data analysis and come to viable conclusions.