An Overview On Principal Component Analysis

Arun Mohan
6 min readMar 14, 2019

Principal component analysis is a statistical tool to reduce dimensions of data sets that contains many variables to less number of dimensions while maintaining maximum variance.Otherwise we can say that PCA is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components.It is done by transforming the variables to principal components which are orthogonal to each other.Before going to PCA we will go through some concepts in maths and important terms related to it.

Co-variance

It can be treated as a measurement of association between variable. A large covariance can mean a strong relationship between variables. However, we can’t compare variances over data sets with different scales (like kilogram and inches).That means when we calculate the covariance of a set of heights and weights, as expressed in (respectively) meters and kilograms, you will get a different covariance from when we do it in other units. The formula for co-variance is:

The solution to this is to ‘normalize’ the covariance. It can be called as correlation.

Correlation

It show how strongly two variable are related to each other. The value of the same ranges for -1 to +1. When two variables have positive correlation, it indicates that when one variable increases, the other also increases .while when correlation is negative , it indicates that as one variable increases other decreases .When correlation > 0.5 , we can say a strong positive corelation. Similarly when correlation < 0.5 we can say a strong negative correlation.

Eigen values and Eigen vectors:

Consider a non-zero vector x. It is an eigen vector of a square matrix A, if Ax is a scalar multiple of x.

It should immediately be clear that, no matter what A and λ are, the vector x = 0 (that is, the vector whose elements are all zero) satisfies this equation. With such a trivial answer, we might ask the question again in another way:

For a given matrix A, what are the nonzero vectors x that satisfy the equation

Ax = λx

for some scalar λ?

To answer this question, we first perform some algebraic manipulations upon the equation Ax = λx

  • The equation Ax = λx has nonzero solutions for the vector x if and only if the matrix A − λI has zero determinant.
  • For a given matrix A there are only a few special values of the scalar λ for which A − λI will have zero determinant, and these special values are called the eigenvalues of the matrix A.

Refer example here.

Orthogonality:

Two variables are orthogonal means uncorrelated to each other.That means correlation between any pair of variables is 0.

Geometric Intution of PCA

Here I will try to explain the geometric intution of PCA with an example.Suppose we have two features F1 (hair colour) and F2(height of people).We know height of people can be of wide range.But hair colour can be of small range.Let us represent it in a graph as follows.

Here if we want to represent this two features as single feature,which feature will we drop. The better option is to drop the axis having low variance. That is ,we wil drop axis F1.ie,hair colour and we preserve direction with maximum variance.

Now standardize the data so that data can be represented in a single scale.Now our data can look like this.

To solve this what we can do is to represent a new axis along the region of maximum spread. Let it be F1' . An axis perpendicular to it is also drawn. Let it be F2'. Here we want to find a direction F1' such that variance of points projected to F1' is maximum. In other words we can say that this is done by rotating F1 by an angle theta to get F1'.Similarly F2 is also rotated in same direction by same angle theta to get F2'.Now we can easily select F1' as our primary feature.This is the besic idea behind PCA. ie, preserving the direction of maximum variance.

Steps in Principal component analysis:

  • Standardize the variables so that all variables are represented in a single scale.
  • Calculate the co-variance matrix.
  • Calculate eigen values and eigen vectors for the co-variance matrix as mentioned above.

Now, let us find the eigen values and eigen vectors of the covariance matrix . This is also called an eigen decomposition. The eigen values tell us the variance in the data set and eigen vectors tell us the corresponding direction of the variance

  • Selecting proper number of components and thus selecting features:

We will arrange eigen values in decreasing order so that we can get it in their order of significance. The eigen vector having highest eigen value will be the first principal component, second highest will be the second principal component and so on. If we are doing dimensional reduction for visualization,we can select top 2 or 3 eigen vectors. If we are reducing to k dimensions , then we have to choose the first k eigen values in such a way that we will not lose much information.

How to choose the value of k?

We will choose k based on Heuristic rule.That means, we will find the sum of eigen values that we select and divide it by total sum of all eigen values.The factor thus we get will determine the variance that is is conserved(or the information that is conserved from whole data).

Now we will form a feature vector. Suppose we are selecting 2 eigen vectors eig1 and eig2. Then our feature vector = (eig1,eig2)

  • Forming Principal Components:

In order to get principal components we have to multiply our initial data matrix with matrix formed by our k eigen vectors.

let our initial data matrix be X of dimension (n x m) where n is the number of data points and m is the number of features.Suppose we want to reduce m features to k components. Let the matrix formed from our top k eigen vectors be E of dimension (m x k).Then our transformed matrix T will be

T = X * E

where T will be of dimension (n x k)

note: I have implemented PCA from scratch on iris dataset using python.You can go through this github link.

References

  • Applied-Ai Course

--

--