## Decision tree §

### What is a Decision Tree? §

Decision tree is a tree-like flow chart to determine the outcome by using a series of decision check.

• Nodes: The point (box) of the attribute which will split
• Edges: The outcome of a split
• Root: The first node that splits
• Leaves: The terminal/final nodes which predicts the outcome

### The sequence of decisions §

The sequence of decision check matters, because it determine which nodes will split first.

### Entropy and Information Gain §

The mathematical methods of choosing the best split.

Entropy

Information Gain:

## Random Forest §

### What is Random Forest? §

Random forest is a method to improve performance of decision trees

#### How it works §

##### Random Sampling of Feature §

Each tree randomly pick some information (features). In implementation, n random records and m features are taken from the data set having k number of records.

##### Individual Decision §

Each tree makes the split and generates the output based on the given sample

##### Combining §
• Classification: Majority Voting
• Regression: Averaging
##### m value §

m value is a value in classification that controls the randomness in selecting features for each tree (usually set as (features))

ChatGPT example

Different friends give Y/N picnic advice based on different available information/features

Question

ChatGPT example is so confusing and I don’t think it makes sense at all The problem is that the features should have different weighting when making the decision.

For example, if the weather is going to be bad, then it’s a complete no no for picnic and we don’t even need to look at other factors. In that case, any trees who doesn’t have the weather information and made the “Yes” decision is not valid.

Random Forest Algorithms - Comprehensive Guide With Examples

### Why do we need random forest? §

#### The problem of decision trees §

Decision tree is not good at predictive accuracy, because:

##### High variance §

Different splits can lead to different tree structures

Bagging: A machine learning procedure to reduce the variance

##### Bagged trees: high correlated §

If a feature is very strong in the data set, when using bagged trees, most trees will choose this feature as the top split which leads to similar tree structures, resulting a high correlated trees that we usually want to avoid

#### Decorrelates the trees §

Random forest decorrelates the trees, and each tree will more dependent to each other.