Attend FREE Webinar on Data Science for Career Growth Register Now

# Discriminant Analysis: A Complete Guide

/

Discriminant analysis is a vital statistical tool that is used by researchers worldwide. Machine learning, pattern recognition, and statistics are some of the spheres where this practice is widely employed. So, what is discriminant analysis and what makes it so useful?

Discriminant analysis, just as the name suggests, is a way to discriminate or classify the outcomes.

It takes continuous independent variables and develops a relationship or predictive equations. These equations are used to categorise the dependent variables.

Three people in three different countries are credited with giving birth to discriminant analysis. These people are Fisher in the UK, Mahalanobis in India, and Hotelling in the US.

If you are classifying the data into two groups, then it is known as Discriminant Function Analysis or DFA. If there are more than two groups, then it is called multiple discriminant analysis (MDA) or Canonical Varieties Analysis (CVA).

This technique is utilised when you already know the output categories and want to come up with a method to successfully classify the dataset.

Say what if you aren’t aware of the categories beforehand? In those cases, you would need to perform clustering.

An example of discriminant analysis is using the performance indicators of a machine to predict whether it is in a good or a bad condition.

## Benefits of Discriminant Analysis

Discriminant Analysis

Discriminant analysis is a valuable tool in statistics. It has gained widespread popularity in areas from marketing to finance. There are some of the reasons for this.

You can use it to find out which independent variables have the most impact on the dependent variable. It helps you understand how each variable contributes towards the categorisation.

It can help in predicting market trends and the impact of a new product on the market.

Even though discriminant analysis is similar to logistic regression, it is more stable than regression, especially when there are multiple classes involved.

It yields reliable results even for small sample size, whereas the same is not valid for regression.

Logistic regression outperforms linear discriminant analysis only when the underlying assumptions, such as the normal distribution of the variables and equal variance of the variables do not hold.

Even in those cases, the quadratic multiple discriminant analysis provides excellent results.

The discussion so far has been about the case when all the samples are available in advance. But this is not always the case, especially in several recent applications.

Samples may come as a steady stream. In these instances, it becomes computationally inefficient to run the whole algorithm repeatedly.

Incremental LDA is the perfect solution here. It updates the features based only on the new samples.

## What is Discriminant Analysis Assumptions?

As mentioned above, the discriminant analysis provides excellent results when its underlying assumptions are satisfied. If meeting these assumptions is easy in practical cases, then it becomes an even more impressive technique.

Let us find out what these assumptions are and whether they can be satisfied or not:

Discriminant Analysis Assumptions

(i) The independent variables have a normal distribution. Most of the variables that are used in real-life applications either have a normal distribution or lend themselves to normal approximation.

(ii) The variances across categories are assumed to be the same across the levels of predictors. Even though this assumption is crucial for linear discriminant analysis, quadratic discriminant analysis is more flexible and is well-suited in these cases. You can also monitor the presence of outliers and transform the variables to stabilise the variance. Logarithmic transformations can be helpful here.

(iii) The predictor variables are assumed to be independent. A correlation between them can reduce the power of the analysis. This problem, however, has an easy solution. You can remove or replace the variables to ensure independence.

(iv) In addition to independence between the variables, the samples themselves are considered to be independent. When you sample a large population, this is a fair assumption.

## How to Perform Discriminant Analysis?

Every discriminant analysis example consists of the following five steps

### 1. Formulate the Problem

You start by answering the question, “What is the objective of discriminant analysis?” After that, identify the independent variables and the categories of outcome that aid this objective.

You can select the independent or predictor variables based on the information available from previous research in the area. You should also use your knowledge of the problem here.

You also need to divide your sample into two groups – analysis and validation. The analysis sample will be used for estimating the discriminant function, whereas the validation sample will be used for checking the results. The sample can be exchanged for cross-validation.

While doing the discriminant analysis example, ensure that the analysis and validation samples are representative of the population.

### 2. Find the Discriminant Function

The discriminant function is written as:

D = b0 + b1X1 + b2X2 +….+ bkXk

Here, ‘D’ is the discriminant score, ‘b’ represents the coefficients or weights for the predictor variables ‘X’.

You already know ‘X’. You need to estimate the values of ‘b’.

There are two ways to do this – direct and stepwise. In the direct method, you include all the variables and estimate the coefficients for all of them. In the other method, the variables are included one by one, based on their ability to discriminate.

The number of discriminant functions required depends on the number of groups and independent predictor variables. If there are Ng groups and k predictors, then you need at least the minimum of Ng-1 and k variables.

### 3. Determine the Significance of the Discriminant Function

The function derived above should be statistically significant. one method to check the significance is by using the eigenvalue of the function. Larger eigenvalue implies better discrimination.

### 4. Interpret the Results

You can analyse the influence of each predictor from its coefficients.

A predictor with high absolute standardised coefficient value plays a more influential role in the discriminating ability of the function. You can also study the canonical loadings.

### 4. Asses the Validity

The data gets categorised based on the discriminant score and a decision rule. Once the validation sample has been classified, calculate the percentage of correct classifications. This cross-validates the results.

## Applications of Discriminant Analysis

Discriminant analysis examples are all around us. The fields in which it is applied are as varied as possible. We are surrounded by its applications. Here are a few to give you an insight into its usefulness.

### 1. Severity of Diseases

Doctors collect data about various health indicators of the patients. This data can be used to classify the severity of the disease. The results from the multiple laboratory and clinical tests will be the predictor variables.

A similar approach can also be used to classify the type of illness that the patient suffers. This can make the diagnosis faster and free from errors.

### 2. Biological Object Classification

One of the most well-known examples of multiple discriminant analysis is in classifying irises based on their petal length, sepal length, and other factors. Discriminant analysis has been used successfully by ecologists to classify species, taxonomic groups, etc.

### 3. Bankruptcy Prediction

The information about a firm’s financial health can be used to predict whether it will go bankrupt or if it will thrive. This technique is commonly employed by banks to make decisions about loans for corporations.

Since the loans given to corporations are always for a large amount, discriminant analysis is essential to ensure that the bank is always making informed decisions.

The firms can then themselves use this technique to predict if their current business strategy will lead them into bankruptcy.

### 4. Loan Application Checking

Banks use a similar approach for individuals as well. The financial history and current situation of a loan applicant are used to determine whether the loan should be approved or not.

It helps the bank weed out those applicants who have a poor credit history and can become a source of bad credit.

### 5. Pattern Recognition

The different aspects of an image can be used to classify the objects in it. Discriminant analysis has also found a place in face recognition algorithms. The pixel values in the image are combined to reduce the number of features needed for representing the face.

Pattern Recognition

### 6. Marketing Products

The data is then used to identify the type of customer who would purchase a product. This can aid the marketing agency in creating targeted advertisements for the product.

## Future of Discriminant Analysis

Data classification and prediction continues to be a relevant field. As seen in the previous section, the range of its applications is immense.

With developments and improvements in the techniques in discriminant analysis, it has been adapted into a form that can provide solutions to modern-day problems.

Incremental DA is a wonderful way of using multiple discriminant analysis to solve the current challenges.

Here is a video to help you get a better understanding of linear discriminant analysis:

One of the discriminant analysis examples was about its use in marketing.

As mentioned above, you need a thorough understanding of the field to choose the correct predictor variables. An understanding of digital marketing techniques, coupled with the knowledge of discriminant analysis will make you a coveted employee for any company.

Would you like to learn more about discriminant analysis and its applications? Want to build a career on using multiple discrimination analysis?

The data science master course by Digital Vidya is just what you need. It covers all the topics that are applied in data science.