Check the full code in github repo.

## Do multiple regression in R §

### Plot the basic correlation §

library(dplyr)
mouse.data <- data.frame(
size = c(1.4, 2.6, 1.0, 3.7, 5.5, 3.2, 3.0, 4.9, 6.3),
weight = c(0.9, 1.8, 2.4, 3.5, 3.9, 4.4, 5.1, 5.6, 6.3),
tail = c(0.7, 1.3, 0.7, 2.0, 3.6, 3.0, 2.9, 3.9, 4.0))

plot(mouse.data)

#### Interpret the graph §

Identify the correlation by eye-balling:

• Both weight and tail are correlated with size.
• weight and tail are also correlated - so we might not need both

### Fit a plane to the data §

Use ”+” sign in lm() to indicate multiple parameters.

multiple.regression <- lm(size ~ weight + tail, data=mouse.data)

### Use summary() to interpret the results §

Note: since we are using multiple regression, the adjusted R-squared would be more meaningful.

Call:
lm(formula = size ~ weight + tail, data = mouse.data)

Residuals:
Min       1Q   Median       3Q      Max
-0.99928 -0.38648 -0.06967  0.34454  1.07932

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.7070     0.6510   1.086   0.3192
weight       -0.3293     0.3933  -0.837   0.4345
tail          1.6470     0.5363   3.071   0.0219 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8017 on 6 degrees of freedom
Multiple R-squared:  0.8496,	Adjusted R-squared:  0.7995
F-statistic: 16.95 on 2 and 6 DF,  p-value: 0.003399

#### How to compare between multiple vs linear regression §

We can do so by comparing the coefficient table with simple linear regression:

The lines of x variables represents the results of multipe regression while comparing with simple linear regression without it.

e.g. The weight line compares the multipel regression y = n + a*weight + b*tail vs simple regression y = n + b*tail. In this case, p-value > 0.05, so weight + tail is not significantly better(important) than using tail alone to predict size, meaning we can get rid of weight in the model.