## C.1 Creating Matrices

### C.1.1matrix()

A matrix can be created – amongst others – with a call to the matrix() function.

(A <- matrix(c(1, 2, 3, 4, 5, 6), byrow=TRUE, nrow=2))
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
class(A)
## [1] "matrix"

Given a numeric vector of length 6, we’ve asked R to convert to a numeric matrix with 2 rows (the nrow argument). The number of columns has been deduced automatically (otherwise, we would additionally have to pass ncol=3 to the function).

Using mathematical notation, above we have defined $$\mathbf{A}\in\mathbb{R}^{2\times 3}$$:

$\mathbf{A}= \left[ \begin{array}{ccc} a_{1,1} & a_{1,2} & a_{1,3} \\ a_{2,1} & a_{2,2} & a_{2,3} \\ \end{array} \right] = \left[ \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{array} \right]$

We can fetch the size of the matrix by calling:

dim(A) # number of rows, number of columns
## [1] 2 3

We can also “promote” a “flat” vector to a column vector, i.e., a matrix with one column by calling:

as.matrix(1:3)
##      [,1]
## [1,]    1
## [2,]    2
## [3,]    3

### C.1.2 Stacking Vectors

Other ways to create a matrix involve stacking a couple of vectors of equal lengths along each other:

rbind(1:3, 4:6, 7:9) # row bind
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
cbind(1:3, 4:6, 7:9) # column bind
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

These functions also allow for adding new rows/columns to existing matrices:

rbind(A, c(-1, -2, -3))
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]   -1   -2   -3
cbind(A, c(-1, -2))
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3   -1
## [2,]    4    5    6   -2

### C.1.3 Beyond Numeric Matrices

Note that logical matrices are possible as well. For instance, knowing that comparison such as < and == are performed elementwise also in the case of matrices, we can obtain:

A >= 3
##       [,1]  [,2] [,3]
## [1,] FALSE FALSE TRUE
## [2,]  TRUE  TRUE TRUE

Moreover, although much more rarely used, we can define character matrices:

matrix(LETTERS[1:12], ncol=6)
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "A"  "C"  "E"  "G"  "I"  "K"
## [2,] "B"  "D"  "F"  "H"  "J"  "L"

### C.1.4 Naming Rows and Columns

Just like vectors could be equipped with names attribute:

c(a=1, b=2, c=3)
## a b c
## 1 2 3

matrices can be assigned row and column labels in the form of a list of two character vectors:

dimnames(A) <- list(
c("a", "b"),     # row labels
c("x", "y", "z") # column labels
)
A
##   x y z
## a 1 2 3
## b 4 5 6

### C.1.5 Other Methods

The read.table() (and its special case, read.csv()), can be used to read a matrix from a text file. We will cover it in the next chapter, because technically it returns a data frame object (which we can convert to a matrix with a call to as.matrix()).

outer() applies a given (vectorised) function on each pair of elements from two vectors, forming a two-dimensional “grid”. More precisely outer(x, y, f, ...) returns a matrix $$\mathbf{Z}$$ with length(x) rows and length(y) columns such that $$z_{i,j}=f(x_i, y_j, ...)$$, where ... are optional further arguments to f.

outer(c(1, 10, 100), 1:5, "*") # apply the multiplication operator
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]   10   20   30   40   50
## [3,]  100  200  300  400  500
outer(c("A", "B"), 1:8, paste, sep="-") # concatenate strings
##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
## [1,] "A-1" "A-2" "A-3" "A-4" "A-5" "A-6" "A-7" "A-8"
## [2,] "B-1" "B-2" "B-3" "B-4" "B-5" "B-6" "B-7" "B-8"

simplify2array() is an extension of the unlist() function. Given a list of vectors, each of length one, it will return an “unlisted” vector. However, if a list of equisized vectors of greater lengths is given, these will be converted to a matrix.

simplify2array(list(1, 11, 21))
## [1]  1 11 21
simplify2array(list(1:3, 11:13, 21:23))
##      [,1] [,2] [,3]
## [1,]    1   11   21
## [2,]    2   12   22
## [3,]    3   13   23
simplify2array(list(1, 11:12, 21:23)) # no can do
## [[1]]
## [1] 1
##
## [[2]]
## [1] 11 12
##
## [[3]]
## [1] 21 22 23

sapply(...) is a nice application of the above, meaning simplify2array(lapply(...)).

sapply(split(iris$Sepal.Length, iris$Species), mean)
##     setosa versicolor  virginica
##      5.006      5.936      6.588
sapply(split(iris$Sepal.Length, iris$Species), summary)
##         setosa versicolor virginica
## Min.     4.300      4.900     4.900
## 1st Qu.  4.800      5.600     6.225
## Median   5.000      5.900     6.500
## Mean     5.006      5.936     6.588
## 3rd Qu.  5.200      6.300     6.900
## Max.     5.800      7.000     7.900

Of course, custom functions can also be applied:

min_mean_max <- function(x) {
# returns a named vector with three elements
# (note that the last expression in a function's body
#  is its return value)
c(min=min(x), mean=mean(x), max=max(x))
}
sapply(split(iris$Sepal.Length, iris$Species), min_mean_max)
##      setosa versicolor virginica
## min   4.300      4.900     4.900
## mean  5.006      5.936     6.588
## max   5.800      7.000     7.900

Lastly, table(x, y) creates a contingency matrix that counts the number of unique pairs of corresponding elements from two vectors of equal lengths.

library("titanic") # data on the passengers of the RMS Titanic
table(titanic_train$Survived) ## ## 0 1 ## 549 342 table(titanic_train$Sex)
##
## female   male
##    314    577
table(titanic_train$Survived, titanic_train$Sex)
##
##     female male
##   0     81  468
##   1    233  109

### C.1.6 Internal Representation (*)

Note that by setting byrow=TRUE in a call to the matrix() function above, we are reading the elements of the input vector in the row-wise (row-major) fashion. The default is the column-major order, which might be a little unintuitive for some of us.

A <- matrix(c(1, 2, 3, 4, 5, 6), ncol=3, byrow=TRUE)
B <- matrix(c(1, 2, 3, 4, 5, 6), ncol=3) # byrow=FALSE

It turns out that is exactly the order in which the matrix is stored internally. Under the hood, it is an ordinary numeric vector:

mode(B)    # == mode(A)
## [1] "numeric"
length(B)  # == length(A)
## [1] 6
as.numeric(A)
## [1] 1 4 2 5 3 6
as.numeric(B)
## [1] 1 2 3 4 5 6

Also note that we can create a different view on the same underlying data vector:

dim(A) <- c(3, 2) # 3 rows, 2 columns
A
##      [,1] [,2]
## [1,]    1    5
## [2,]    4    3
## [3,]    2    6
dim(B) <- c(3, 2) # 3 rows, 2 columns
B
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6