**Link**: Multiple regression R Linear regression in R

Check the full code in github repo.

## Do multiple regression in R

### Plot the basic correlation

```
library(dplyr)
mouse.data <- data.frame(
size = c(1.4, 2.6, 1.0, 3.7, 5.5, 3.2, 3.0, 4.9, 6.3),
weight = c(0.9, 1.8, 2.4, 3.5, 3.9, 4.4, 5.1, 5.6, 6.3),
tail = c(0.7, 1.3, 0.7, 2.0, 3.6, 3.0, 2.9, 3.9, 4.0))
plot(mouse.data)
```

#### Interpret the graph

Identify the correlation by eye-balling:

- Both
**weight**and**tail**are correlated with**size**. **weight**and**tail**are also correlated - so we might not need both

### Fit a plane to the data

Use ”+” sign in `lm()`

to indicate multiple parameters.

`multiple.regression <- lm(size ~ weight + tail, data=mouse.data)`

### Use summary() to interpret the results

Note: since we are using multiple regression, the **adjusted R-squared** would be more meaningful.

```
Call:
lm(formula = size ~ weight + tail, data = mouse.data)
Residuals:
Min 1Q Median 3Q Max
-0.99928 -0.38648 -0.06967 0.34454 1.07932
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.7070 0.6510 1.086 0.3192
weight -0.3293 0.3933 -0.837 0.4345
tail 1.6470 0.5363 3.071 0.0219 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8017 on 6 degrees of freedom
Multiple R-squared: 0.8496, Adjusted R-squared: 0.7995
F-statistic: 16.95 on 2 and 6 DF, p-value: 0.003399
```

#### How to compare between multiple vs linear regression

We can do so by comparing the **coefficient** table with simple linear regression:

The lines of x variables represents the results of multipe regression while comparing with simple linear regression without it.

e.g. The **weight** line compares the multipel regression y = n + a*weight + b*tail vs simple regression y = n + b*tail. In this case, p-value > 0.05, so **weight + tail is not significantly better(important)** than using **tail** alone to predict **size**, meaning we can get rid of weight in the model.