2.2 Multiple Linear Regression

2.2.1 Problem Formulation


Let’s now generalise the above to the case of many variables \(X_1, \dots, X_p\).

We wish to model the dependent variable as a function of \(p\) independent variables. \[ Y = f(X_1,\dots,X_p) \qquad (+\varepsilon) \]

Restricting ourselves to the class of linear models, we have \[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p. \]

Above we studied the case where \(p=1\) with \(\beta_1=a\) and \(\beta_0=b\).


The above equation defines:

  • \(p=1\) — a line
  • \(p=2\) — a plane
  • \(p\ge 3\) — a hyperplane

Most people find it difficult to imagine objects in high dimensions, but we are lucky to have this thing called maths.

plot of chunk scatterplot3dexample

2.2.2 Fitting a Linear Model in R


lm() accepts a formula of the form Y~X1+X2+...+Xp.

It finds the least squares fit, i.e., solves \[ \min_{\beta_0, \beta_1,\dots, \beta_p\in\mathbb{R}} \sum_{i=1}^n \left( \beta_0 + \beta_1 x_{i,1}+\dots+\beta_p x_{i,p} - y_i \right) ^2 \]

## (Intercept)          X1          X2 
## 172.5586670   0.1828011   2.1976461

By the way, the above 3D scatter plot was generated by calling:

(s3d is an R list, one of its elements named plane3d is a function object – this is legal)