B.2 Numeric Vectors

B.2.1 Creating Numeric Vectors

First let’s introduce a few ways with which we can create numeric vectors.

B.2.1.1 c()

The c() function combines a given list of values to form a sequence:

## [1] 1 2 3
## [1] 1 2 3 4 5 6 7 8

Note that when we use the assignment operator, <- or = (both are equivalent), printing of the output is suppressed:

## [1] 1 2 3

However, we can enforce it by parenthesising the whole expression:

## [1] 1 2 3

In order to determine that x is indeed a numeric vector, we call:

## [1] "numeric"
## [1] "numeric"

These two functions might return different results. For instance, in the next chapter we note that a numeric matrix will yield mode() of numeric and class() of matrix.

What is more, we can get the number of elements in x by calling:

## [1] 3

B.2.1.2 seq()

To create an arithmetic progression, i.e., a sequence of equally-spaced numbers, we can call the seq() function

## [1] 1 3 5 7 9

If we access the function’s documentation (by executing ?seq in the console), we’ll note that the function takes a couple of parameters: from, to, by, length.out etc.

The above call is equivalent to:

## [1] 1 3 5 7 9

The by argument can be replaced with length.out, which gives the desired size:

## [1] 0.00 0.25 0.50 0.75 1.00

Note that R supports partial matching of argument names:

## [1] 0.00 0.25 0.50 0.75 1.00

Quite often we need progressions with step equal to 1 or -1. Such vectors can be generated by applying the : operator.

##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1]  -1  -2  -3  -4  -5  -6  -7  -8  -9 -10

B.2.1.3 rep()

Moreover, rep() replicates a given vector. Check out the function’s documentation (see ?rep) for the meaning of the arguments provided below.

## [1] 1 1 1 1 1
##  [1] 1 2 3 1 2 3 1 2 3 1 2 3
## [1] 1 1 2 2 2 2 3 3 3
##  [1] 1 1 1 1 2 2 2 2 3 3 3 3

B.2.1.4 Pseudo-Random Vectors

We can also generate vectors of pseudo-random values. For instance, the following generates 5 deviates from the uniform distribution (every number has the same probability) on the unit (i.e., \([0,1]\)) interval:

## [1] 0.5649012 0.5588051 0.4414829 0.2076393 0.6696355

We call such numbers pseudo-random, because they are generated arithmetically. In fact, by setting the random number generator’s state (also called the seed), we can obtain reproducible results.

## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673
## [1] 0.0455565 0.5281055 0.8924190 0.5514350 0.4566147
## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673

Note the difference between the uniform distribution on \([0,1]\) and the normal distribution with expected value of \(0\) and standard deviation of \(1\) (also called the standard normal distribution).

plot of chunk runif_vs_rnorm

Another useful function samples a number of values from a given vector, either with or without replacement:

## [1]  3  3 10  2  6  5  4  6
## [1]  9  5  3  8  1  4  6 10

Note that if n is a single number, sample(n, ...) is equivalent to sample(1:n, ...). This is a dangerous behaviour than may lead to bugs in our code. Read more at ?sample.

B.2.2 Vector-Scalar Operations

Mathematically, we sometimes refer to a vector that is reduced to a single component as a scalar. We are used to denoting such objects with lowercase letters such as \(a, b, i, s, x\in\mathbb{R}\).

Note that some programming languages distinguish between atomic numerical entities and length-one vectors, e.g., 7 vs. [7] in Python. This is not the case in R, where length(7) returns 1.

Vector-scalar arithmetic operations such as \(s\boldsymbol{x}\) (multiplication of a vector \(\boldsymbol{x}=(x_1,\dots, x_n)\) by a scalar \(s\)) result in a vector \(\boldsymbol{y}\) such that \(y_i=s x_i\), \(i=1,\dots,n\).

The same rule holds for, e.g., \(s+\boldsymbol{x}\), \(\boldsymbol{x}-s\), \(\boldsymbol{x}/s\).

## [1]  0.5  5.0 50.0
## [1] 11 12 13 14 15
## [1] 0.0 0.2 0.4 0.6 0.8 1.0

By \(-\boldsymbol{x}\) we will mean \((-1)\boldsymbol{x}\):

## [1]  0.00 -0.25 -0.50 -0.75 -1.00

Note that in R the same rule applies for exponentiation:

## [1]  0  1  4  9 16 25
## [1]  1  2  4  8 16 32

However, in mathematics, we are not used to writing \(2^{\boldsymbol{x}}\) or \(\boldsymbol{x}^2\).

B.2.3 Vector-Vector Operations

Let \(\boldsymbol{x}=(x_1,\dots,x_n)\) and \(\boldsymbol{y}=(y_1,\dots,y_n)\) be two vectors of identical lengths.

Arithmetic operations \(\boldsymbol{x}+\boldsymbol{y}\) and \(\boldsymbol{x}-\boldsymbol{y}\) are performed elementwise, i.e., they result in a vector \(\boldsymbol{z}\) such that \(z_i=x_i+y_i\) and \(z_i=x_i-y_i\), respectively, \(i=1,\dots,n\).

## [1]    2   12  103 1004
## [1]    0   -8  -97 -996

Although in mathematics we are not used to using any special notation for elementwise multiplication, division and exponentiation, this is available in R.

## [1]    1   20  300 4000
## [1] 1.000 0.200 0.030 0.004
## [1] 1e+00 1e+02 1e+06 1e+12

1e+12 is a number written in the scientific notation. It means “1 times 10 to the power of 12”, i.e., \(1\times 10^{12}\). Physicists love this notation, because they are used to dealing with very small (think sizes of quarks) and very large (think distances between galaxies) entities.

Moreover, in R the recycling rule is applied if we perform elementwise operations on vectors of different lengths – the shorter vector is recycled as many times as needed to match the length of the longer vector, just as if we were performing:

##  [1] 1 2 3 1 2 3 1 2 3 1 2 3

Therefore:

## [1] 1 2 3 4 5 6
## [1]  1 20  3 40  5 60
## [1]   1  20 300   4  50 600
## Warning in 1:6 * c(1, 10, 100, 1000): longer object length is
## not a multiple of shorter object length
## [1]    1   20  300 4000    5   60

Note that a warning is not an error – we still get a result that makes sense.

B.2.4 Aggregation Functions

R implements a couple of aggregation functions:

  • sum(x) = \(\sum_{i=1}^n x_i=x_1+x_2+\dots+x_n\)
  • prod(x) = \(\prod_{i=1}^n x_i=x_1 x_2 \dots x_n\)
  • mean(x) = \(\frac{1}{n}\sum_{i=1}^n x_i\) – arithmetic mean
  • var(x) = sum((x-mean(x))^2)/(length(x)-1) = \(\frac{1}{n-1} \sum_{i=1}^n \left(x_i - \frac{1}{n}\sum_{j=1}^n x_j \right)^2\) – variance
  • sd(x) = sqrt(var(x)) – standard deviation

see also: min(), max(), median(), quantile().

Remember that you can always access the R manual by typing ?functionname, e.g., ?quantile.

Note that \(\sum_{i=1}^n x_i\) can also be written as \(\displaystyle\sum_{i=1}^n x_i\) or even \(\displaystyle\sum_{i=1,\dots,n} x_i\). These all mean the sum of \(x_i\) for \(i\) from \(1\) to \(n\), that is, the sum of \(x_1\), \(x_2\), …, \(x_n\), i.e., \(x_1+x_2+\dots+x_n\).

B.2.5 Special Functions

Furthermore, R supports numerous mathematical functions, e.g., sqrt(), abs(), round(), log(), exp(), cos(), sin(). All of them are vectorised – when applied on a vector of length \(n\), they yield a vector of length \(n\) in result.

For example, here is how we can compute the square roots of all the integers between 1 and 9:

## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490
## [7] 2.645751 2.828427 3.000000

B.2.6 Norms and Distances

Norms are used to measure the size of an object. Mathematically, we will also be interested in the following norms:

  • Euclidean norm: \[ \|\boldsymbol{x}\| = \|\boldsymbol{x}\|_2 = \sqrt{ \sum_{i=1}^n x_i^2 } \] this is nothing else than the length of the vector \(\boldsymbol{x}\)
  • Manhattan (taxicab) norm: \[ \|\boldsymbol{x}\|_1 = \sum_{i=1}^n |x_i| \]
  • Chebyshev (maximum) norm: \[ \|\boldsymbol{x}\|_\infty = \max_{i=1,\dots,n} |x_i| = \max\{ |x_1|, |x_2|, \dots, |x_n| \} \]

The above norms can be easily implemented by means of the building blocks introduced above. This is super easy:

## [1] 2.236068
## [1] 3
## [1] 2

Also note that all the norms easily generate the corresponding distances (metrics) between two given points. In particular:

\[ \| \boldsymbol{x}-\boldsymbol{y} \| = \sqrt{ \sum_{i=1}^n \left(x_i-y_i\right)^2 } \]

gives the Euclidean distance (metric) between the two vectors.

## [1] 1

This is the “normal” distance that you learned at school.

B.2.7 Dot Product (*)

What is more, given two vectors of identical lengths, \(\boldsymbol{x}\) and \(\boldsymbol{y}\), we define their dot product (a.k.a. scalar or inner product) as:

\[ \boldsymbol{x}\cdot\boldsymbol{y} = \sum_{i=1}^n x_i y_i. \]

Let’s stress that this is not the same as the elementwise vector multiplication in R – the result is a single number.

## [1] 1

(*) Note that the squared Euclidean norm of a vector is equal to the dot product of the vector and itself, \(\|\boldsymbol{x}\|^2 = \boldsymbol{x}\cdot\boldsymbol{x}\).

(*) Interestingly, a dot product has a nice geometrical interpretation: \[ \boldsymbol{x}\cdot\boldsymbol{y} = \|\boldsymbol{x}\| \|\boldsymbol{y}\| \cos\alpha \] where \(\alpha\) is the angle between the two vectors. In other words, it is the product of the lengths of the two vectors and the cosine of the angle between them. Note that we can get the cosine part by computing the dot product of the normalised vectors, i.e., such that their lengths are equal to 1.

For example, the two vectors u and v defined above can be depicted as:

plot of chunk unnamed-chunk-30

We can compute the angle between them by calling:

## [1] 1
## [1] 1.414214
## [1] 0.7071068
## [1] 45

B.2.8 Missing and Other Special Values

R has a notion of a missing (not-available) value. It is very useful in data analysis, as we sometimes don’t have an information on an object’s feature. For instance, we might not know a patient’s age if he was admitted to the hospital unconscious.

Operations on missing values generally result in missing values – that makes a lot sense.

## [1] 12 14 NA 18 20
## [1] NA

If we wish to compute a vector’s aggregate after all, we can get rid of the missing values by calling na.omit():

## [1] 3

Note that in R, a dot has no special meaning. na.omit is as good of a function’s name or variable identifier as na_omit, naOmit, NAOMIT, naomit and NaOmit. Note that, however, R is a case-sensitive language – these are all different symbols. Read more in the Details section of ?make.names.

Moreover, some arithmetic operations can result in infinities (\(\pm \infty\)):

## [1] -Inf
## [1] Inf

Also, sometimes we’ll get a not-a-number, NaN. This is not a missing value, but a “invalid” result.

## Warning in sqrt(-1): NaNs produced
## [1] NaN
## Warning in log(-1): NaNs produced
## [1] NaN
## [1] NaN