## The main idea of linear regression §

1. Use Least squares to Fitting a line to data in linear regression
2. Calculated R-squared
3. Calcualte a P-value for

## Terminology §

### Var(mean) variation around the mean §

In this way, var() can be viewed as the average SS.

## How does it work §

1. Meausre

2. Measure

3. Plug in and we get

Note that when is 0, then = 1

4. Use p-value to determine if is statistically significant When there’re only two data points for the line, = 0. So we need a p-value to identify things like this. P-value for comes from , the F-value.

### F-value §

The larger F-value is, it indicates y can explain more variation in x.

is also known as degree of freedom.

### Steps to covert F-value to p-value: §

1. Generate a set of random data

2. Calculate the mean and ss(mean)

3. Plug-in and get value, plot the value in histogram

4. Repeat many times for random datasets

5. Do the same for the original dataset, and get . p-value = the number of more extreme values than divided by all F values

### F-distributions §

In above example, the sample of red is smaller than blue while other parameters are the same. The blue line is steeper, and thus, when we calculate the probability of the extreme values for getting p-value, the p-value would be smaller. Therefore, more sample size leads to a smaller p-value.