## What is LDA? §

It reduces dimensions, just like PDA.

Example

Assume we use one gene to decide whether a drug works on different people

Use one predictor vs two predictors (gene)

Use three+ predictors: it will be a 3D or more dimensional chart

## How does it reduce dimension? §

We can’t just simply dropping one of the axis (e.g. gene y) because it will lose data. Instead, LDA creates a new axis and project new data there. LDA creates the new axis based on two criterias:

1. Maximize the distance between two mean and
2. Minimize the variation (“scatter”, ) within each category
3. Consider two above simutanously

where is also called (d for distance).

### Why we need the two criterias §

If we only maxmize , there will be lots of dots overlapped and thus hard to separate them.

### Two predictors (genes) §

Same process of creating new axis, but it will be 3 dimensional.

### Three categories §

The process follow the same rule, but the implementation is a bit different.

1. Maxmize

1. Find the main central points of all data
2. Find the central points of each category
3. Get , , , the distance between each category and the main central point
4. Maximize the . Now we have 3 , so we need to add them up in order max the value
2. Minimize the scatter for each category

3. Consider above two simutanously:

4. Instead of creating one axis, LDA now creates two axes to separate the data. This is because we have three categories which form a plane, not a line.

An example of usage in application

Say we have 3 categories and 10,000 genes. If we plot the raw data without reducing dimensions, it would require 10,000 axes. However, LDA can reduce the axes to only 2.

## LDA vs PDA §

• PCA doesn’t focus on separating cateogries. It uses mainly for looking max variation
• Both reduce dimensions by creating new axes.
• PCA creates 1st axis on 1st max variation in data, and 2nd axis and so on
• LDA careates 1st axis per most variation between categories, and so on

-We can also dig in to see which predictors (genes) are most impactful

## Summary §

LDA is a method to reduce dimensions. It’s useful for separating categories.