Link: Bayes classifier
What is LDA?
It reduces dimensions, just like PDA.
Example
Assume we use one gene to decide whether a drug works on different people
Use one predictor vs two predictors (gene)
Use three+ predictors: it will be a 3D or more dimensional chart
How does it reduce dimension?
We can’t just simply dropping one of the axis (e.g. gene y) because it will lose data. Instead, LDA creates a new axis and project new data there. LDA creates the new axis based on two criterias:
- Maximize the distance between two mean and
- Minimize the variation (“scatter”, ) within each category
- Consider two above simutanously
where is also called (d for distance).
Why we need the two criterias
If we only maxmize , there will be lots of dots overlapped and thus hard to separate them.
Two predictors (genes)
Same process of creating new axis, but it will be 3 dimensional.
Three categories
The process follow the same rule, but the implementation is a bit different.
-
Maxmize
- Find the main central points of all data
- Find the central points of each category
- Get , , , the distance between each category and the main central point
- Maximize the . Now we have 3 , so we need to add them up in order max the value
-
Minimize the scatter for each category
-
Consider above two simutanously:
-
Instead of creating one axis, LDA now creates two axes to separate the data. This is because we have three categories which form a plane, not a line.
An example of usage in application
Say we have 3 categories and 10,000 genes. If we plot the raw data without reducing dimensions, it would require 10,000 axes. However, LDA can reduce the axes to only 2.
LDA vs PDA
- PCA doesn’t focus on separating cateogries. It uses mainly for looking max variation
- Both reduce dimensions by creating new axes.
- PCA creates 1st axis on 1st max variation in data, and 2nd axis and so on
- LDA careates 1st axis per most variation between categories, and so on
-We can also dig in to see which predictors (genes) are most impactful
Summary
LDA is a method to reduce dimensions. It’s useful for separating categories.