B.5 Vector Subsetting

B.5.1 Subsetting with Positive Indices

In order to extract subsets (parts) of vectors, we use the square brackets:

##  [1]  10  20  30  40  50  60  70  80  90 100
## [1] 10
## [1] 100

More than one element at a time can also be extracted:

## [1] 10 20 30
## [1]  10 100

For example, the order() function returns the indices of the smallest, 2nd smallest, 3rd smallest, …, the largest element in a given vector. We will use this function when implementing our first classifier.

## [1] 3 4 2 5 1

Hence, we see that the smallest element in y is at index 3 and the largest at index 1:

## [1] 10
## [1] 50

Therefore, to get a sorted version of y, we call:

## [1] 10 20 30 40 50

We can also obtain the 3 largest elements by calling:

## [1] 50 40 30

B.5.2 Subsetting with Negative Indices

Subsetting with a vector of negative indices, excludes the elements at given positions:

## [1]  20  30  40  50  60  70  80  90 100
## [1]  40  50  60  70  80  90 100
## [1]  40  60  70  90 100

B.5.3 Subsetting with Logical Vectors

We may also subset a vector \(\boldsymbol{x}\) of length \(n\) with a logical vector \(\boldsymbol{l}\) also of length \(n\). The \(i\)-th element, \(x_i\), will be extracted if and only if the corresponding \(l_i\) is true.

## [1]  10  50  70  80 100

This gets along nicely with comparison operators that yield logical vectors on output.

##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [1]  60  70  80  90 100
## [1]  10  20  80  90 100

B.5.4 Replacing Elements

Note that the three above vector indexing schemes (positive, negative, logical indices) allow for replacing specific elements with new values.

##  [1]    10 10000 10000 10000 10000 10000 10000 10000 10000 10000
##  [1]    10 10000 10000 10000 10000 10000 10000     1     2     3

B.5.5 Other Functions

head() and tail() return, respectively, a few (6 by default) first and last elements of a vector.

## [1]    10 10000 10000 10000 10000 10000
## [1] 1 2 3

Sometimes the which() function can come in handy. For a given logical vector, it returns all the indices where TRUE elements are stored.

## [1] 1 3 4 7
## [1] 50 30 10 20 40
## [1] 1 5

Note that y[y>70] gives the same result as y[which(y>70)] but is faster (because it involves less operations).

which.min() and which.max() return the index of the smallest and the largest element, respectively:

## [1] 3
## [1] 1
## [1] 10

is.na() indicates which elements are missing values (NAs):

## [1] FALSE FALSE  TRUE FALSE  TRUE FALSE

Therefore, to get rid of them, we can write (compare na.omit(), see also is.finite()):

## [1] 1 2 4 6