**Link**: Supervised learning

## Logistic regression

### What is logistic regression?

- Logistic regression is a method for classification
- It helps
**classification**: either 0 or 1 - It’s a specific type of
**generalized linear model(GLM)**

### Why to use logistic regression, and not liner regression

- Because normal linear regression model on binary groups
- Result is either 0 or 1, while linear regression is a continuous line which can go beyond 0 or 1 (beyond limit)
- It poorly fits the data

- Logistic regression is a transformed form of linear regression

### Sigmoid Function/Logistic Function

#### What is Sigmoid Function (Logistic function) ?

A function to transform any value to be between 0 and 1

$ϕ(z)=1+e_{−z}1 $

#### How to use it in evaluation?

We can set cutoff point at 0.5:

- Below 0.5 belongs to 0
- Above 0.5 belongs to 1

##### How to interpret when the point is in 0.5?

There’s a 50/50 chance that the result is either 0 or 1.

#### Explain the math: where does it comes from?

Linear regression model:

$y=b_{0}+b_{1}x$

Transformed to Logistic regression model:

$p=1+e_{−(b_{0}+b_{1}x)}1 $

## Evaluate the model

Using Confusion matrix

### Simple example of confusion matrix

A simple example to predict disease:

n=165 | Predicted: N | Predicted: Y | |
---|---|---|---|

Actual: N | 50 (TN) | 10 (FP=Type-I) | 60 |

Actual: Y | 5 (FN=Type-II) | 100 (TP) | 105 |

55 | 110 |

### Terminology

- True Positives (TP)
- True Negatives (TN)
- False Positives (FP):
**Type-I error** - False Negatives (FN):
**Type-II error**

### Evaluate with confusion matrix

#### Accuracy rate

$Accuracy=totalTP+TN $

In the example:

$Accuracy=totalTP+TN =165100+50 =0.91$

#### Misclassification rate (Error rate)

$Error=totalFP+FN $

In the example:

$Error=totalFP+FN =16510+5 =0.09$