Canonical Correlation Analysis is used to identify and measure the correlation between two sets of variables. It’s a similar technique to PCA/Factor analysis. In particular, canonical correlation analysis seeks the best sets of linear combinations with independent variables related to dependent variables. Canonical correlation method was first developed by Hotelling in the year 1935, but was widely using after 50 years with the help of computers and statistical software’s.

Canonical correlation requires that each set of variables be reduced to a single variable and then finding their variables. Moreover, The two variables are found by taking linear combinations of the variables in each set under certain pre-fixed condition. The output variables obtained from the linear combination are called canonical variables, and the correlation between them is called canonical correlation.

## Why Canonical Correlation

Canonical correlation is a technique that seek to identify and quantify the relationship between two sets of variables. Furthermore, it is a popular statistical technique and is widely using in many areas of social science, psychological research, and marketing analytics. Unlike regression analysis, researchers can monitor the relationship between many dependent and independent variables.

## Assumptions

Most of the multivariate techniques assumptions are applicable for Canonical correlation.

- Assumes the linear relationship between dependent and independent variable
- Independent variables should not be highly correlated
- Uniform variability
- Additionally, multivariate normality is necessary to perform a statistical test

## Difference between Multiple Correlation and Canonical Correlation

Generally, we study the relationship between one dependent variable and one independent variable in a simple correlation. Similarly, we study the relationship between one dependent variable and multiple independent variables in multiple correlation. In other words, a relationship between a variable Y and a set of variables (X1,X2,…Xn). Whereas, in canonical correlation, we study the relationship between two sets of variables. It is similar to multiple regression, however, we have more than one dependent variable.

## Formula and Nomenclature

Let X and Y be p and q dimensional random vectors and assume that p≤q

Let µ_{X} and µ_{Y }be mean of Y and X and Σ_{X} and Σ_{Y }be the covariance matrix of X and Y

U = a^{T}Y and V= b^{T}Y be linear combinations of Y and X respectively

So,

- Var(U) = a
^{T}Σ_{Y}^{a} - Var(V) = b
^{T}Σ_{X}^{b} - Cov(U,V) = a
^{T}Σ_{YX}^{b}

Therefore, the correlation between U and V

- Canonical Function: Analogues to components in a principal component analysis
- Canonical Correlation: Correlation between two canonical function
- Structure coefficient / Canonical loadings: Correlation between a variable and its canonical functions
- Eigenvalue value: It represents the percent of variance in one variable accounted for by another variable, it is like canonical correlation’s version of R
^{2}

The canonical correlation coefficient is similar to the Pearson correlation coefficient, usually it should have value more than 0.30. The squared value represents less than 10% in overlapping variance between pairs of canonical variates.

It is difficult to perform the canonical correlation manually, different software packages like SPSS, R, Matlab, and Python will help to compute analysis.

Also see Multivariate Analysis

## Helpful Links

- https://en.wikipedia.org/wiki/Canonical_correlation
- https://stats.idre.ucla.edu/r/dae/canonical-correlation-analysis/
- https://online.stat.psu.edu/stat505/book/export/html/682

## Helpful Videos

## ASQ Six Sigma Green Belt Correlation Analysis Questions:

**Question:** A correlation analysis is used to provide a numeric value for which of the following types of relationships between two variables?

A) Random

B) Linear

C) Curvilinear

D) Causation

**Answer:**