## B.5 Vector Subsetting

### B.5.1 Subsetting with Positive Indices

In order to extract subsets (parts) of vectors, we use the square brackets:

(x <- seq(10, 100, 10))
##  [1]  10  20  30  40  50  60  70  80  90 100
x[1]         # the first element
## [1] 10
x[length(x)] # the last element
## [1] 100

More than one element at a time can also be extracted:

x[1:3] # the first three
## [1] 10 20 30
x[c(1, length(x))] # the first and the last
## [1]  10 100

For example, the order() function returns the indices of the smallest, 2nd smallest, 3rd smallest, …, the largest element in a given vector. We will use this function when implementing our first classifier.

y <- c(50, 30, 10, 20, 40)
(o <- order(y))
## [1] 3 4 2 5 1

Hence, we see that the smallest element in y is at index 3 and the largest at index 1:

y[o[1]]
## [1] 10
y[o[length(y)]]
## [1] 50

Therefore, to get a sorted version of y, we call:

y[o] # see also sort(y)
## [1] 10 20 30 40 50

We can also obtain the 3 largest elements by calling:

y[order(y, decreasing=TRUE)[1:3]]
## [1] 50 40 30

### B.5.2 Subsetting with Negative Indices

Subsetting with a vector of negative indices, excludes the elements at given positions:

x[-1] # all but the first
## [1]  20  30  40  50  60  70  80  90 100
x[-(1:3)]
## [1]  40  50  60  70  80  90 100
x[-c(1:3, 5, 8)]
## [1]  40  60  70  90 100

### B.5.3 Subsetting with Logical Vectors

We may also subset a vector $$\boldsymbol{x}$$ of length $$n$$ with a logical vector $$\boldsymbol{l}$$ also of length $$n$$. The $$i$$-th element, $$x_i$$, will be extracted if and only if the corresponding $$l_i$$ is true.

x[c(TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)]
## [1]  10  50  70  80 100

This gets along nicely with comparison operators that yield logical vectors on output.

x>50
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
x[x>50] # select elements in x that are greater than 50
## [1]  60  70  80  90 100
x[x<30 | x>70]
## [1]  10  20  80  90 100

### B.5.4 Replacing Elements

Note that the three above vector indexing schemes (positive, negative, logical indices) allow for replacing specific elements with new values.

x[-1] <- 10000
x
##  [1]    10 10000 10000 10000 10000 10000 10000 10000 10000 10000
x[-(1:7)] <- c(1, 2, 3)
x
##  [1]    10 10000 10000 10000 10000 10000 10000     1     2     3

### B.5.5 Other Functions

head() and tail() return, respectively, a few (6 by default) first and last elements of a vector.

head(x) # head(x, 6)
## [1]    10 10000 10000 10000 10000 10000
tail(x, 3)
## [1] 1 2 3

Sometimes the which() function can come in handy. For a given logical vector, it returns all the indices where TRUE elements are stored.

which(c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
## [1] 1 3 4 7
print(y) # recall
## [1] 50 30 10 20 40
which(y>30)
## [1] 1 5

Note that y[y>70] gives the same result as y[which(y>70)] but is faster (because it involves less operations).

which.min() and which.max() return the index of the smallest and the largest element, respectively:

which.min(y) # where is the minimum?
## [1] 3
which.max(y)
## [1] 1
y[which.min(y)] # min(y)
## [1] 10

is.na() indicates which elements are missing values (NAs):

z <- c(1, 2, NA, 4, NA, 6)
is.na(z)
## [1] FALSE FALSE  TRUE FALSE  TRUE FALSE

Therefore, to get rid of them, we can write (compare na.omit(), see also is.finite()):

(z <- z[!is.na(z)])
## [1] 1 2 4 6