Linear regression

The main idea of linear regression

Use Least squares to Fitting a line to data in linear regression
Calculated R-squared
Calcualte a P-value for $R^{2}$

Terminology

SS(mean) sum of squares around the mean

SS(mean) = (data - mean)^{2}

Var(mean) variation around the mean

var(mean) = \frac{(data - mean) ^{2}}{n} = \frac{SS(mean)}{n}

In this way, var() can be viewed as the average SS.

SS(fit) sum of squares of around the least-squares fit

SS(fit) = (data - line)^{2}

Var(fit) variation around the line

var(fit) = \frac{SS(fit)}{n}

How does it work

Meausre $SS (mean)$
Measure $SS (fit)$
Plug in and we get $R^{2}$
$R^{2} = \frac{SS ( mean ) - SS ( fit )}{SS ( mean )}$
Note that when $SS (fit)$ is 0, then $R^{2}$ = 1
Use p-value to determine if $R^{2}$ is statistically significant When there’re only two data points for the line, $SS (fit)$ = 0. So we need a p-value to identify things like this. P-value for $R^{2}$ comes from $F$ , the F-value.

F-value

F = \frac{the variation in x explained by y}{the variation in x not explained by y} = \frac{SS ( mean ) - SS ( fit ) / ( P _{fit} - P _{mean} )}{SS ( fit ) / ( n - P _{fit} )}

The larger F-value is, it indicates y can explain more variation in x.

$(P_{fit} - P_{mean}) / (n - P_{fit})$ is also known as degree of freedom.

Steps to covert F-value to p-value:

Generate a set of random data
Calculate the mean and ss(mean)
Plug-in and get $F$ value, plot the value in histogram
Repeat many times for random datasets
Do the same for the original dataset, and get $F_{0}$ . p-value = the number of more extreme values than $F_{0}$ divided by all F values

F-distributions

In above example, the sample of red is smaller than blue while other parameters are the same. The blue line is steeper, and thus, when we calculate the probability of the extreme values for getting p-value, the p-value would be smaller. Therefore, more sample size leads to a smaller p-value.

🪴 Aster's notebook

Recent updated

Explorer

Linear regression

The main idea of linear regression

Terminology

SS(mean) sum of squares around the mean

Var(mean) variation around the mean

SS(fit) sum of squares of around the least-squares fit

Var(fit) variation around the line

How does it work

F-value

Steps to covert F-value to p-value:

F-distributions

Graph View

Table of Contents

Backlinks

🪴 Aster's notebook

Recent updated

Explorer

Linear regression

The main idea of linear regression §

Terminology §

SS(mean) sum of squares around the mean §

Var(mean) variation around the mean §

SS(fit) sum of squares of around the least-squares fit §

Var(fit) variation around the line §

How does it work §

F-value §

Steps to covert F-value to p-value: §

F-distributions §

Graph View

Table of Contents

Backlinks

The main idea of linear regression

Terminology

SS(mean) sum of squares around the mean

Var(mean) variation around the mean

SS(fit) sum of squares of around the least-squares fit

Var(fit) variation around the line

How does it work

F-value

Steps to covert F-value to p-value:

F-distributions