Principal Component Analysis, or PCA, is a statistical procedure that essentially involves coordinate transformation. It involves the orthogonal transformation of possibly correlated variables into a set of linearly uncorrelated variables called principal components.

If the original data is plotted on an X and Y axis, the principal component analysis will modify these axes so that the new X-axis lies along the direction of maximum variation in the data. The Y-axis will be determined by the choice of the X-axis, as PCA requires that the X and Y axis be perpendicular.

If there are more than two dimensions, the first principal component analysis axis is in the direction of the most variation, and then the axes are defined in decreasing variation. Consider the following graph which is a principal component analysis example.

Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there. By 2020, it’s estimated that 1.7MB of data will be created every second for every person on earth.

To better understand what is principal component analysis, let’s understand the concept of dimensionality reduction. Let’s assume you need to predict the stock market price of a particular company, for the month of March 2020.

Now you will have numerous amounts of past data available to you: stick fluctuations, external market fluctuations, buy order details, sell order details, company revenue, company gross profits, etc. Having multiple variables to plot poses a significant problem. Are the variables related?

In such cases, statisticians tend to ask the question Can the results be achieved by only considering a few sets of variables?

Or in other words, you are looking to reduce the dimension of your feature space in order to have fewer relationships between variables, and this is called dimensionality reduction. There are two types of dimensionality reduction:

- Feature Elimination
- Feature Extraction

### Feature Elimination

Feature elimination involves the complete elimination of variables that might not have an impact on the end result.

For example, in our stock exchange project, you might want to consider the top 3 or 5 variables that affect the stock market price and drop everything else.

While this simplifies the problem, it also reduces the accuracy of the project because of a reduction in variables.

### Feature Extraction

Feature extraction solves the problem of variable dilution that feature elimination causes. For example, if you have fifteen independent variables, feature extraction involves the defining of fifteen new independent variables where each new variable is created by a combination of each of the ten old independent variables. The new independent variables are created in a way and in order to best predict the original variables.

You might be wondering how this solves the problem of having a large number of variables since we still have fifteen variables. Once the new variables are defined, you can drop the unimportant ones and use the top 3-5 important variables for plotting. Because each new variable is formed by considering all fifteen old variables, you are still using all information.

Principal Component Analysis is a feature extraction technique. This explains WHY PCA works – it gives you a solution to incorporate all available variable data while reducing the number of variables and correlations.

## Principal Component Analysis Example

We will explain HOW principal component analysis works and is implemented but first consider the following principal component analysis example. Consider a 2D set of variables, consisting of height and weight.

Normally this dataset will be plotted along the X and Y axis, as shown in the image below (original data set). If you applied principal component analysis to tease out variation, you will define 2 new axes – pc1 and pc2, that each has a new (x,y) value thus considering both the variables of the original data set.

Now that you’ve seen a principal component analysis example, let’s understand HOW it is plotted and how it works.

## Tutorial on Principal Component Analysis

In order to apply PCA, let’s consider a tutorial on principal component analysis. This will also further solidify what is principal component analysis. Here’s what we are going to achieve:

(i) We will calculate and define a matrix that will summarize how each of the variables relates to one another.

(ii) This matrix will then be broken down into two separate components – direction and magnitude.

Consider the following original data set. X and Y are the original data, and there 2 main directions that we define in this data, we’ll call them the red and green direction.

Now, let’s transform the original data to align with the directions. The transformed data, with the x-axis as the red direction and the y-axis as the green direction, would look like this:

If you notice, in image 1, we first define the red direction which involves the most variation. We then define the green direction perpendicular to the red direction. We then plot the new principal component analysis graph by using the red and green direction as the X-axis and Y-axis.

Now, this is a simplified, 2 dimensional tutorial on principal component analysis. Consider a 3-dimensional model. By defining the important directions, we can drop less important ones and project the data in a smaller, simplified space.

Taking the tutorial on principal component analysis a step further, let’s build an algorithm for executing PCA. Now that you have an understanding of what is principal component analysis, you should be able to grasp this algorithm.

### One

Organize a tabular column with, say ‘n’ rows and ‘p+1’ columns. Each column corresponds to a dependent variable (which we usually denote by Y) and p columns where each one relates to an independent variable (whose matrix is usually denoted by X).

### Two

If there is a Y variable as a part of your data, separate the data into X and Y as defined above.

### Three

Now take the matrix of independent variables X and for each column in your table, subtract the mean of that column from each entry. This brings the mean of each column to zero.

### Four

Now you have to determine if the data has to be standardized. For the columns in correspondence to X, are the features that have a higher variance more important (in terms of the prediction of Y) than the features that have lower variance? Is the importance of the features independent of variance? If the importance is not dependant on the variation of the features, divide each observation in a column by that column’s standard deviation. This will standardize the values in each column of X to ensure every column has a mean zero and standard deviation 1.

### Five

Transpose the Z matrix and multiply the result by Z. This is denoted as ZᵀZ.

### Six

We now need to calculate the eigenvectors and their corresponding eigenvalues of ZᵀZ. Most computing packages allow you to do this easily. The eigendecomposition of ZᵀZ is where we decompose ZᵀZ into PDP⁻¹. Here, P is the eigenvectors matrix and D is a diagonal matrix created with the eigenvalues on the diagonal zero everywhere else. Each eigenvalue on the diagonal of D will correspond to a column p. Basically, the first element of D is λ₁ and the corresponding eigenvector is the first column of p. This is true for each element in D and every corresponding eigenvector in p. This will allow us to calculate PDP⁻¹

### Seven

Now sort each of the eigenvalues (λ₁, λ₂, …, λp) in decreasing order, from large to small. Simultaneously sort the corresponding p columns. For example, if λ3 is the largest value, column 3 should be placed first. Some computing packages perform this automatically. This is the sorted matrix of eigenvectors P*. P* is the same as P but in a different order.

### Eight

Now calculate Z*=ZP*. The new resulting matrix, Z*, is a centered/standardized version of X with every observation as a combination of the original variables, where the weights are determined by the eigenvector.

This is the result where 5 data points are transformed using PCA. The right graph is Z*, the transformed data.

2 important aspects of these graphs are:

**First** – The data is the same in both graphs, the difference is the right graph depicts the original data that is principal component analysis transformed.

**Second** – The principal component is perpendicular to one another in both graphs.

### Nine

Lastly, we determine which features are important and we want to keep them. You can achieve this in three ways:

(i) Randomly or manually select the dimensions you want to keep.

(ii) Calculate the proportion of variance explained by picking a threshold and adding a feature until you reach the threshold.

(iii) Similar to pint 2, calculate the proportion of variance explained for each feature, the sort the features by the proportion of variance explained and then plot the cumulative proportion of variance explained.

## Why Principal Component Analysis Works?

Considering our example above, the matrix ZᵀZ contains the estimate of how each variable in Z relates to every other variable in Z. This gives a powerful correlation between the variables.

Eigenvectors denote directions in the scatterplot of data, and eigenvalues denote magnitude. Both of these are important and are derived during PCA.

We move ahead with the assumption that variation in a direction correlates with the behavior of the dependent variable. A lot variation typically indicates signal and little variation typically indicated noise. Thus, more variation is a sign of something we should detect, as is our assumption.

To put this simply:

(i) The principal component analysis gives us a correlation between variables through the covariance matrix ZᵀZ.

(ii) It gives us the direction in which our data is dispersed through the eigenvectors.

(iii) We understand the relative importance of these different directions through the eigenvalues.

## Use Case Scenario of PCA

Consider the following example, which does not have 2 or 3 variables, but 17 dimensions. The below image depicts the average consumption of 17 different types of food per person for every country in the UK, in grams.

The problem we face is, there are too many variables and random correlations between them that we cannot accurately define. Applying PCA, we get the following first principal component:

Northern Ireland is clearly the most different among the four. If you go back to the table, you will see a pattern that explains this: Northern Irish, as compared to the other 3, consume fewer amounts of fresh fruits, cheese, fish and alcoholic drinks, and eat more grams of fresh potatoes.

Here is the PCA graph:

This example shows how a complex table of data can be simplified to extract useful information, using PCA.

## Libraries to Execute Principal Component Analysis

### 1. Weka

Machine learning library for Java that contains modules to compute principal components.

### 2. Matplotlib

Python library that provides the principal component analysis package within the .mlab module.

### 3. ALGLIB

Principal component analysis library for C++ and C#

### 4. NMath

A proprietary numerical library that provides principal component analysis modules for .NET Framework.

### 5. Princomp

Library for principal component analysis in R.

### Conclusion

The principal component analysis is useful in dimensionality reduction when there are numerous numbers of variables. It is used in facial recognition, computer vision, and image compression. It also finds application in pattern-finding where there are high dimensions of data, like in finance, psychology, data mining, bioinformatics, etc.

With the availability of redefined libraries that execute principal component analysis, it is quite easy to incorporate PCA into a program. You can either use the libraries or hard code the algorithm (as the mathematical procedure remains the same)

By retaining trends and patterns while simultaneously simplifying the complexity of high dimensional data, principal component analysis allows the easy analysis of complex data. In today’s world, the volumes of data and hence complexity is inherently high.

If you consider a field like Biology and a subset like a gene study, the number of variables within just that one subset of study is massive. This not only poses a challenge in terms of creating a program, but can also be computationally heavy, and slow.

The principal component analysis is a solution that simplifies the building of data algorithms, and hence improves the speed of the outcome.

You may also enroll in a Data Science Using Python course for more lucrative career options in Data Science.