C.2 Algebraic Operations

C.2.1 Matrix Transpose

The matrix transpose is denoted with \(\mathbf{A}^T\):

##      [,1] [,2] [,3]
## [1,]    1    4    2
## [2,]    5    3    6

Hence, \(\mathbf{B}=\mathbf{A}^T\) is a matrix such that \(b_{i,j}=a_{j,i}\).

In other words, in the transposed matrix, rows become columns and columns become rows. For example:

\[ \mathbf{A}= \left[ \begin{array}{ccc} a_{1,1} & a_{1,2} & a_{1,3} \\ a_{2,1} & a_{2,2} & a_{2,3} \\ \end{array} \right] \qquad \mathbf{A}^T= \left[ \begin{array}{cc} a_{1,1} & a_{2,1} \\ a_{1,2} & a_{2,2} \\ a_{1,3} & a_{2,3} \\ \end{array} \right] \]

C.2.2 Matrix-Scalar Operations

Operations such as \(s\mathbf{A}\) (multiplication of a matrix by a scalar), \(-\mathbf{A}\), \(s+\mathbf{A}\) etc. are applied on each element of the input matrix:

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
##      [,1] [,2] [,3]
## [1,]   -1   -2   -3
## [2,]   -4   -5   -6

In R, the same rule holds when we compute other operations (despite the fact that, mathematically, e.g., \(\mathbf{A}^2\) or \(\mathbf{A}\ge 0\) might have a different meaning):

##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   16   25   36
##       [,1]  [,2] [,3]
## [1,] FALSE FALSE TRUE
## [2,]  TRUE  TRUE TRUE

C.2.3 Matrix-Matrix Operations

If \(\mathbf{A},\mathbf{B}\in\mathbb{R}^{n\times p}\) are two matrices of identical sizes, then \(\mathbf{A}+\mathbf{B}\) and \(\mathbf{A}-\mathbf{B}\) are understood elementwise, i.e., they result in \(\mathbf{C}\in\mathbb{R}^{n\times p}\) such that \(c_{i,j}=a_{i,j}\pm b_{i,j}\).

##      [,1] [,2] [,3]
## [1,]    0    0    0
## [2,]    0    0    0

In R (but not when we use mathematical notation), all other arithmetic, logical and comparison operators are also applied in an elementwise fashion.

##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   16   25   36
##       [,1]  [,2]  [,3]
## [1,] FALSE FALSE  TRUE
## [2,]  TRUE  TRUE FALSE

C.2.4 Matrix Multiplication (*)

Mathematically, \(\mathbf{A}\mathbf{B}\) denotes the matrix multiplication. It is a very different operation to the elementwise multiplication.

##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

This is not the same as the elementwise A*I.

Matrix multiplication can only be performed on two matrices of compatible sizes – the number of columns in the left matrix must match the number of rows in the right operand.

Given \(\mathbf{A}\in\mathbb{R}^{n\times p}\) and \(\mathbf{B}\in\mathbb{R}^{p\times m}\), their multiply is a matrix \(\mathbf{C}=\mathbf{A}\mathbf{B}\in\mathbb{R}^{n\times m}\) such that \(c_{i,j}\) is the dot product of the \(i\)-th row in \(\mathbf{A}\) and the \(j\)-th column in \(\mathbf{B}\): \[ c_{i,j} = \mathbf{a}_{i,\cdot} \cdot \mathbf{b}_{\cdot,j} = \sum_{k=1}^p a_{i,k} b_{k, j} \] for \(i=1,\dots,n\) and \(j=1,\dots,m\).

As an exercise, we recommend multiplying a few simple matrices of sizes \(2\times 2\), \(2\times 3\), \(3\times 2\) etc. using pen and paper and checking the results in R.

Also remember that, mathematically, squaring a matrix is done in terms of matrix multiplication, i.e., \(\mathbf{A}^2 = \mathbf{A}\mathbf{A}\). It can only be performed on square matrices, i.e., ones with the same number of rows and columns. This is again different than R’s elementwise A^2.

Note that \(\mathbf{A}^T \mathbf{A}\) gives the matrix that consists of the dot products of all the pairs of columns in \(\mathbf{A}\).

##      [,1] [,2]
## [1,]   10   14
## [2,]   14   20

In one of the chapters on Regression, we note that the Pearson linear correlation coefficient can be beautifully expressed this way.

C.2.5 Matrix-Vector Operations

Mathematically, there is no generally agreed upon convention defining arithmetic operations between matrices and vectors.

(*) The only exception is the matrix – vector multiplication in the case where an argument is a column or a row vector, i.e., in fact, a matrix. Hence, given \(\mathbf{A}\in\mathbb{R}^{n\times p}\) we may write \(\mathbf{A}\mathbf{x}\) only if \(\mathbf{x}\in\mathbb{R}^{p\times 1}\) is a column vector. Similarly, \(\mathbf{y}\mathbf{A}\) makes only sense whenever \(\mathbf{y}\in\mathbb{R}^{1\times n}\) is a row vector.

Please take notice of the fact that we consistently discriminate between different bold math fonts and letter cases: \(\mathbf{X}\) is a matrix, \(\mathbf{x}\) is a row or column vector (still a matrix, but a sequence-like one) and \(\boldsymbol{x}\) is an ordinary vector (one-dimensional sequence).

However, in R, we might sometimes wish to vectorise an arithmetic operation between a matrix and a vector in a row- or column-wise fashion. For example, if \(\mathbf{A}\in\mathbb{R}^{n\times p}\) is a matrix and \(\mathbf{m}\in\mathbb{R}^{1\times p}\) is a row vector, we might want to subtract \(m_i\) from each element in the \(i\)-th column. Here, the apply() function comes in handy:

  • apply(A, 1, f) applies a given function \(f\) on each row of \(\mathbf{A}\).
  • apply(A, 2, f) applies a given function \(f\) on each column of \(\mathbf{A}\).

Usually, either \(f\) returns a single value (when we wish to aggregate all the elements in a row/column) or returns the same number of values (when we wish to transform a row/column).

Example: to create a centred version of a given matrix, we need to subtract from each element the arithmetic mean of its column.

##      [,1] [,2] [,3]
## [1,]    1    2    5
## [2,]    2    4    8
## [1] 1.5 3.0 6.5
##      [,1] [,2] [,3]
## [1,] -0.5   -1 -1.5
## [2,]  0.5    1  1.5

The above is equivalent to:

##      [,1] [,2] [,3]
## [1,] -0.5   -1 -1.5
## [2,]  0.5    1  1.5