What is the Box Cox Transformation?

A Box Cox Transformation is a simple calculation that may help your data set follow a normal distribution. Box Cox transformation was first developed by two British statisticians namely George Box and Sir David Cox.

When the assumption of data normally distributed is violated or the relationship between the dependent and independent variables in case of linear model are not linear, in such situations some transformations methods that may help the data set follow a normal distribution. Box Cox is one such transformation method.

The basic assumption of Box-Cox is data must be positive (no negative values) and also data should be continuous.

What Does Box Cox have to do with Multiple Regression Analysis?

Box-Cox transformation is the basic tool in Multiple Regression Analysis. The assumption of any linear modes is that relationship between the response variable Y and the predictor variable X’s is linear, however this is the not true the all the times, so when the relationship between the dependent variable and independent variable is not linear and still wish to fit a linear model to the data then consider a Box-Cox transformation method. This will transform the predictor variable or the response variable and then fit a linear model to the data to study the effect that the predictor variable has on the transformed responses.

The basic assumption of linear models is that the error terms are normally distributed. Significant violation of the assumption also leads to committing the type I or type II error.

In addition, the benefits of Box-Cox transformation which includes less skewness, maintains linear relationship between response variable Y and the predictor variable X’s, almost equal spread etc.,

The Box Cox Equation

The original form of the Box- Cox transformation are given by

Box Cox Transformation

In 1964 paper, Box-Cox proposed an extended form of the two parameter Box-Cox transformation

Box Cox Transformation

When would you use this transformation during the DMAIC process?

Process capability studies are performed during Measure phase of DMAIC. The first step for process capability studies are to check where the data follows normal distribution or not (it is more important for parametric tests like ANOVA etc).

Box-Cox method helps to address non-normally distributed data by transforming to normalize the data. However there is no guarantee that data follows normality, because it does not really checks for normality.

The Box-Cox method checks whether the standard deviation is the smallest or not. Hence it is always advisable to check the transformed data for normality using probability plot or Q-Q(Quantile-Quantile) plot.

How to use Box Cox to calculate Process capability for non-normal data

There may be no advantage of calculating the process capability for non-normal raw data, in other words it may give inaccurate results. Data should be transformed to normalize before calculating the process capability. While there are various data transformation methods exists like log transformation, power transformation, Exponential, Reciprocal etc.,

In order to use the right transformation method some data analysis may be required. One of the foremost power transformation method is Box-Cox method.

The formula is yI = yLambda

Where Lambda power that must be determined to transform the data. The usual assumption of parameter Lambda values varies between -5 and 5. The likelihood of transformed data is maximum and data are normally distributed when the standard deviation value is small.

Most Common Box-Cox Transformations

Box Cox Transformation

Example: if the Lambda is 2 then yLambda = y2

An Example of a Box Cox Transformation by Hand

Box Cox transformations in practice are typically done by leveraging software that can try many different variations of Box Cox transforms very quickly.

Doing it by hand in practice is time-consuming and error prone. Imagine trying varying types of lamba by hand until you run them all or run out of patience!

“But what about on a Six Sigma exam?” I can hear you say. “I won’t have MiniTab or R Studio available! What will I do?”

Not to worry.

In my experience the questions on the exam are rather simple. You’re usually just having to do or understand the following:

  • Sometimes your data doesn’t appear to be normal, but if you transform it, you can achieve normality – which then opens up a bunch of other properties and tools for you (or at least easier tools ;’)).
  • While Box-Cox is complex, questions on Six Sigma exams are usually very simple. Just substitute variables into the following equation:
    • X(transform) = X ^ Lambda

Example: if the Lambda is 2 then yI = y2

All you have to do is replace your original data with the “new equation” using a lamba of 2.

As the example chart here shows you, all you’d have to do is just take the original value and square it.

“Old measure” 2 now becomes “New measure” 4 because we are simply substituting into X(transform) = X ^ Lambda for the following: X(2) = 2 ^ 2.

An Example of a Box Cox Transformation Using MiniTab

An Example of a Box Cox Transformation Using MiniTab

Box Cox Transformation in Minitab tool, Excel Analysis toolpak or any other statistical software tools. These tools automatically calculates an appropriate power transformation

Example: Raw data

Step 1: Perform the normality test to see whether the data follows normal distribution or not

From the above graph P value is less than 0.005, hence the data does not follows normal distribution and from the histogram it clearly shows data skewed one side.

Step 2: Transform the data using Box Cox Transformation

Transformed data

Step 3: Again test the normality

From the above graph the p value is greater than 0.05, hence it is clear that data follows normal distribution and from the histogram also we can see the data uniformly distributed.

What Do You Need to Know for Your Six Sigma Exam?

Green Belt

The IASSC Six Sigma Green Belt BOK requires as part of the Improve Phase.

Black Belt

The IASSC Six Sigma Black Belt BOK requires as part of the Improve Phase.

The ASQ Six Sigma Black Belt BOK requires the following:

Process capability for non-normal data
Identify non-normal data and determine when it is appropriate to use Box-Cox or other transformation techniques. (Apply)

Helpful Videos

This first video has poor audio, but gives a good overview.

This second video shows a great practical example leveraging R studio. You’re unlikely to have to go into this level of detail on an exam. I include it because it’s a great example with very helpful plots of data that help you visualize what a transformation can do to help you progress through your data analysis and come to viable conclusions.


Comments (9)

Dear Ted,

Can you share the process of how the value of lambda is derived?
How do we decide for the below that Lambda is 2 or any other value?
Example: if the Lambda is 2 then yI = y2


Hi Anshika,

Good question. The greek character Lambda didn’t come out well in the example so I’ve changed it above.

It now reads if Lambda is 2, then Y^Lambda = Y^2.

Think of it this way. The most important thing to remember is that the heart of the Box Cox transformation is an exponent, lambda (λ).

Lambda can vary from -5 to 5.

What you’re trying to do is to transform your existing function into something that looks and acts like a more ‘normal’ distribution.

In practice you would try all kinds of values for lambda to see what works best. Obviously automated tools help.

The best version of lambda would be the one that results in transforming your data into the best approximation of a normal distribution curve.

Does that help?

Hello Ted,

Do you have any bibliographic support that I can use to explain a maximum variance of Lambda from -5 and 5 in my work. I have performed box-cox transformations and I have seen that for larger lambdas (like 9)an approximation to normality cant be properly performed.


I don’t have a reference at hand.

Here are 2 places you might examine (1 the linked article, 2 the article they mention by Draper and Cox):

Clearly not all data could be power-transformed to Normal. Draper and Cox (1969) studied this problem and conclude that even in cases that no power-transformation could bring the distribution to exactly normal, the usual estimates of λ will lead toa distribution that satisfies certain restrictions on the first 4 moments, thus will be usually symmetric. (source)

Are there any limitations to the usage of Box-Cox Transformation? In simple terms, does it hold valid for every scenario/analysis or are there any exceptions to the usage?


You generally use a transform for a specific purpose, usually to make dealing with the data / function easier. The effects of the transform are dependent on what you transform into what.

One of the most popular transforms is towards a ‘normal’ distribution. In that case, what would you imagine an X Y plot would look like?

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.