Linear discriminant analysis is an extremely popular dimensionality reduction technique. Dimensionality reduction techniques have become critical in machine learning since many high-dimensional datasets exist these days.

Linear Discriminant Analysis was developed as early as 1936 by Ronald A. Fisher. The original Linear discriminant applied to only a 2-class problem. It was only in 1948 that C.R. Rao generalized it to apply to multi-class problems.

**What Is Linear Discriminant Analysis(LDA)?**

It is used as a dimensionality reduction technique. Also known as a commonly used in the pre-processing step in machine learning and pattern classification projects.

In Python, it helps to reduce high-dimensional data set onto a lower-dimensional space. The goal is to do this while having a decent separation between classes and reducing resources and costs of computing.

Original technique that was developed was known as the Linear Discriminant or Fisher’s Discriminant Analysis. This was a two-class technique. The multi-class version, as generalized by C.R. Rao, was called Multiple Discriminant Analysis. Here is a video that clearly explains LDA.

**What Is Dimensionality Reduction?**

To understand in a better, let’s begin by understanding what dimensionality reduction is.

Multi-dimensional data is data that has multiple features which have a correlation with one another. Dimensionality reduction simply means plotting multi-dimensional data in just 2 or 3 dimensions.

An alternative to dimensionality reduction is plotting the data using scatter plots, boxplots, histograms, and so on. We can then use these graphs to identify the pattern in the raw data.

However, with charts, it is difficult for a layperson to make sense of the data that has been presented. Moreover, if there are many features in the data, thousands of charts will need to be analyzed to identify patterns.

Dimensionality reduction algorithms solve this problem by plotting the data in 2 or 3 dimensions. This allows us to present the data explicitly, in a way that can be understood by a layperson.

**Linear Discriminant Analysis For Dummies**

It works on a simple step-by-step basis. Here is an example. These are the three key steps.

(i) Calculate the separability between different classes. This is also known as between-class variance and is defined as the distance between the mean of different classes.

(ii) Calculate the within-class variance. This is the distance between the mean and the sample of every class.

(iii) Construct the lower-dimensional space that maximizes Step1 (between-class variance) and minimizes Step 2(within-class variance). In the equation below P is the lower-dimensional space projection. This is also known as Fisher’s criterion.

**Representation Of Linear Discriminant Models**

The representation of Linear Discriminant models consists of the statistical properties of the dataset. These are calculated separately for each class. For instance, for a single input variable, it is the mean and variance of the variable for every class.

If there are multiple variables, the same statistical properties are calculated over the multivariate Gaussian. This includes the means and the covariance matrix. All these properties are directly estimated from the data. They directly go into the Linear Discriminant Analysis equation.

The statistical properties are estimated on the basis of certain assumptions. These assumptions help simplify the process of estimation. One such assumption is that each data point has the same variance.

Another assumption is that the data is Gaussian. This means that each variable, when plotted, is shaped like a bell curve. Using these assumptions, the mean and variance of each variable are estimated.

** How To Make Predictions **

The linear Discriminant analysis estimates the probability that a new set of inputs belongs to every class. The output class is the one that has the highest probability. That is how the LDA makes its prediction.

LDA uses Bayes’ Theorem to estimate the probabilities. If the output class is (k) and the input is (x), here is how Bayes’ theorem works to estimate the probability that the data belongs to each class.

P(Y=x|X=x) = (PIk * fk(x)) / sum(PIl * fl(x))

In the above equation:

Plk – Prior probability. This is the base probability of each class as observed in the training data

f(x) – the estimated probability that x belongs to that particular class. f(x) uses a Gaussian distribution function.

**LDA vs Other Dimensionality Reduction Techniques**

Two dimensionality-reduction techniques that are commonly used for the same purpose as Linear Discriminant Analysis are Logistic Regression and PCA (Principal Components Analysis). However, these have certain unique features that make it the technique of choice in many cases. Here are its comparison points against other techniques.

**1. Linear Discriminant Analysis vs PCA**

(i) PCA is an unsupervised algorithm. It ignores class labels altogether and aims to find the principal components that maximize variance in a given set of data. Linear Discriminant Analysis, on the other hand, is a supervised algorithm that finds the linear discriminants that will represent those axes which maximize separation between different classes.

(ii) Linear Discriminant Analysis often outperforms PCA in a multi-class classification task when the class labels are known. In some of these cases, however, PCA performs better. This is usually when the sample size for each class is relatively small. A good example is the comparisons between classification accuracies used in image recognition technology.

(ii) Many times, the two techniques are used together for dimensionality reduction. PCA is used first followed by LDA.

**Linear Discriminant Analysis vs Logistic Regression**

**(i) Two-Class vs Multi-Class Problems**

Logistic regression is both simple and powerful. However, it is traditionally used only in binary classification problems. While it can be extrapolated and used in multi-class classification problems, this is rarely done. When it’s a question of multi-class classification problems, linear discriminant analysis is usually the go-to choice. In fact, even with binary classification problems, both logistic regression and linear discriminant analysis are applied at times.

**(ii) Instability With Well-Separated Classes**

Logistic regression can become unstable when the classes are well-separated. This is where the Linear Discriminant Analysis comes in.

**(iii) Instability With Few Examples**

If there are just a few examples from the parameters need to be estimated, logistic regression tends to become unstable. In this situation too, Linear Discriminant Analysis is the superior option as it tends to stay stable even with fewer examples.

**Linear Discriminant Analysis via Scikit Learn**

Of course, you can use a step-by-step approach to implement Linear Discriminant Analysis. However, the more convenient and more often-used way to do this is by using the Linear Discriminant Analysis class in the Scikit Learn machine learning library. Here is an example of the code to be used to achieve this.

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis **as** LDA

*# LDA*

sklearn_lda **=** LDA(n_components**=**2)

X_lda_sklearn **=** sklearn_lda**.**fit_transform(X, y)

**def** **plot_scikit_lda**(X, title):

ax **=** plt**.**subplot(111)

**for** label,marker,color **in** zip(

range(1,4),(‘^’, ‘s’, ‘o’),(‘blue’, ‘red’, ‘green’)):

plt**.**scatter(x**=**X[:,0][y **==** label],

y**=**X[:,1][y **==** label] ***** **–**1, *# flip the figure*

marker**=**marker,

color**=**color,

alpha**=**0.5,

label**=**label_dict[label])

plt**.**xlabel(‘LD1’)

plt**.**ylabel(‘LD2’)

leg **=** plt**.**legend(loc**=**‘upper right’, fancybox**=**True)

leg**.**get_frame()**.**set_alpha(0.5)

plt**.**title(title)

*# hide axis ticks*

plt**.**tick_params(axis**=**“both”, which**=**“both”, bottom**=**“off”, top**=**“off”,

labelbottom**=**“on”, left**=**“off”, right**=**“off”, labelleft**=**“on”)

*# remove axis spines*

ax**.**spines[“top”]**.**set_visible(False)

ax**.**spines[“right”]**.**set_visible(False)

ax**.**spines[“bottom”]**.**set_visible(False)

ax**.**spines[“left”]**.**set_visible(False)

plt**.**grid()

plt**.**tight_layout

plt**.**show()

plot_step_lda()

plot_scikit_lda(X_lda_sklearn, title**=**‘Default LDA via scikit-learn’)

**Extensions & Variations **

Due to its simplicity and ease of use, Linear Discriminant Analysis has seen many extensions and variations. These have all been designed with the objective of improving the efficacy of Linear Discriminant Analysis examples. Here are some common Linear Discriminant Analysis examples where extensions have been made.

**(i) Flexible Discriminant Analysis (FDA)**

Regular Linear Discriminant Analysis uses only linear combinations of inputs. The Flexible Discriminant Analysis allows for non-linear combinations of inputs like splines.

**(ii) Quadratic Discriminant Analysis (QDA)**

In Quadratic Discriminant Analysis, each class uses its own estimate of variance when there is a single input variable. In case of multiple input variables, each class uses its own estimate of covariance.

**(iii) Regularized Discriminant Analysis (RDA)**

This method moderates the influence of different variables on the Linear Discriminant Analysis. It does so by regularizing the estimate of variance/covariance.

## Conclusion

LDA Python has become very popular because it’s simple and easy to understand. While other dimensionality reduction techniques like PCA and logistic regression are also widely used, there are several specific use cases in which LDA is more appropriate. Thorough knowledge of Linear Discriminant Analysis is a must for all data science and machine learning enthusiasts.

If you are also inspired by the opportunities provided by the data science landscape, enroll in our data science master course and elevate your career as a data scientist.