Regression Analysis is a way of estimating the relationships between different variables by examining the behavior of the system. There are many techniques for modeling and analyzing the dependent and independent variables. For example, transformations can be used to reduce the higher-order terms in the model.

Remember the equation for a line that you learned in high school? Y = mx + b where m is the slope of the line and b is the point on the y axis where the line intercepts? Given the slope (m) and the y intercept (b), you can plug in any value for X and get a result y. Very straightforward and very useful. That’s what we are trying to do in root cause analysis when we say “solve for y.”

Though statistical linear models described as a classic straight line, often linear models are represented by curvilinear graphs. While non-linear regression aka Attributes Data Analysis is used to explain the nonlinear relationship between a response variable and one or more than one predictor variable (mostly curve line).

Unfortunately (or perhaps entertainingly) real life systems do not always boil down to a simple equation. Sometimes you just have a collection of points on your graph and you need to make sense of them. That’s where regression analysis comes in to play; you are basically trying to derive an equation from the graph of your data.

“In the business world, the rear view mirror is always clearer than the windshield.”

Warren Buffet

## Linear Regression Analysis

The easiest kind of regression is linear regression. Imagine that all of your data lined up in a neat row. You could draw a straight line connecting all points and would be able to create a simple equation Y = mx + b that we talked about earlier. That way you would have a model that would faithfully predict what your system would do given any input of x.

But what if your data only “kinda-sorta” looks like a line?

Multiple linear regression is an extension to methodology of simple linear regression

## Method of Least Squares

Method of least squares is a method to create the best-possible approximation of a line given the data set.

How well the created line fits the data can be measured by the Standard Error of Estimate. The larger the Standard Error of the Estimate, the greater the dispersion of the charted points around the line.

The normal rules of Standard Deviation apply here; 68% of the points should be within +/- 1 Standard Error of the line, 95.5% of the points within +/- 2 Standard Error.

For more examples of Least Squares, see linear regression

### Coefficient of Determination (R^2 aka R Squared)

The Coefficient of Determination provides the percentage of variation in Y that is explained by the regression line.

### Coefficient of Correlation is r.

-Just take the square root of the coefficient of determination. Sqrt(R Squared)

Go here for more on the correlation coefficient.

### Measuring the validity of the model

Use the F statistic to find a p value of the system. The degrees of freedom for the regression is equal to the number of Xs in the equation (in linear regression, this is 1 because there is only 1 x in the equation y=mx+b). The degrees of freedom for the

The smaller the p value, the better. But really you judge this by finding the acceptable level of alpha risk and seeing if that percent is greater than the p value. For example, if your alpha risk level is 5% and the p value is 0.014, then you have to reject the hypothesis – in this case you’d reject that the line that was created is a suitable model as it was not able to create significant results.

Residual Analysis: “Since a linear regression model is not always appropriate for the data, assess the appropriateness of the model by defining residuals and examining residual plots.”

## ASQ Six Sigma Black Belt Exam Regression Analysis Questions

Question: In regression analysis, which of the following techniques can be used to reduce the higher-order terms in the model?

A) Large samples.

B) Dummy variables.

C) Transformations.

D) Blocking.

## Authors

• I originally created SixSigmaStudyGuide.com to help me prepare for my own Black belt exams. Overtime I've grown the site to help tens of thousands of Six Sigma belt candidates prepare for their Green Belt & Black Belt exams. Go here to learn how to pass your Six Sigma exam the 1st time through!

•  Andrew Pfeiffer says:

Ted, what exactly does “transformations can be used to reduce the hire-order terms in the model” mean. What are ‘higher-order terms’? Why do they need to be reduced? Reduced from what? To what? Ted Hessing says:

Hi Andrew,

Typo there – thanks for catching it! Hire should be Higher.

I was referring to a case where you might use a mathematical transform to bring a complicated model (eg Y = X^3 + 5X^2 + 4x + 1) to something more easily analyzed.

Does that help? Barbara Hyde says:

Ted, confused on why would reject the null if P is .14 which is greater the .05. Wouldn’t I accept the null? Can you help me understand why I would reject the null? Thank you.

Measuring the validity of the model
Use the F statistic to find a p value of the system. The degrees of freedom for the regression is equal to the number of Xs in the equation (in linear regression, this is 1 because there is only 1 x in the equation y=mx+b). The degrees of freedom for the

The smaller the p value, the better. But really you judge this by finding the acceptable level of alpha risk and seeing if that percent is greater than the p value. For example, if your alpha risk level is 5% and the p value is 0.14, then you have to reject the hypothesis – in this case you’d reject that the line that was created is a suitable model as it was not able to create significant results. Ramana PV says:

Thank you, Barbara. It seems a zero missing in 0.14. We have updated the article. Barbara Hyde says:

Under “additional helpful resources, there is a link that has bad info. the author does not seem to understand the difference between x (independent variables and Y – response, dependent variable). she consistently misuses it all the way through. Suggest, if I am correct, that you remove the link.

Step by Step regression analysis Ramana PV says:

You are right Barbara, especially the total multiple regression session was messed up. We removed the reference link– Thanks for your feedback.

This site uses Akismet to reduce spam. Learn how your comment data is processed.