D.3 Data Frame Subsetting

D.3.1 Each Data Frame is a List

First of all, we should note that each data frame is in fact represented as an ordinary named list:

## [1] "data.frame"
## [1] "list"

Each column is stored as a separate list item. Having said that, we shouldn’t be surprised that we already know how to perform quite a few operations on data frames:

## [1] 3
## [1] "u" "v" "w"
## [1] 0.1815171 0.9197226 0.3117235 0.0641516 0.3964216
## [1]  TRUE FALSE FALSE  TRUE FALSE
##           u w
## 1 0.1815171 A
## 2 0.9197226 B
## 3 0.3117235 C
## 4 0.0641516 D
## 5 0.3964216 E
##         u         v         w 
## "numeric" "logical"  "factor"

D.3.2 Each Data Frame is Matrix-like

Data frames can be considered as “generalised” matrices. Therefore, operations such as subsetting will work in the same manner.

## [1] 5 3
##           u     v w
## 1 0.1815171  TRUE A
## 2 0.9197226 FALSE B
##           u w
## 1 0.1815171 A
## 2 0.9197226 B
## 3 0.3117235 C
## 4 0.0641516 D
## 5 0.3964216 E
## [1] 0.1815171 0.9197226 0.3117235 0.0641516 0.3964216
##           u
## 1 0.1815171
## 2 0.9197226
## 3 0.3117235
## 4 0.0641516
## 5 0.3964216

Take a special note of selecting rows based on logical vectors. For instance, let’s extract all the rows from x for which the values in the column named u are greater than 0.5:

##           u     v w
## 2 0.9197226 FALSE B

Moreover, subsetting based on integer vectors can be used to change the order of rows. Here is how we can sort the rows in x with respect to the values in column u:

##           u     v w
## 4 0.0641516  TRUE D
## 1 0.1815171  TRUE A
## 3 0.3117235 FALSE C
## 5 0.3964216 FALSE E
## 2 0.9197226 FALSE B

Let’s stress that the programming style we emphasise on here is very transparent. If we don’t understand how a complex operation is being executed, we can always decompose it into smaller chunks that can be studied separately. For instance, as far as the last example is concerned, we can take a look at the manual of ?order and then inspect the result of calling order(x$u).

On a side note, we can re-set the row names by referring to:

##           u     v w
## 1 0.0641516  TRUE D
## 2 0.1815171  TRUE A
## 3 0.3117235 FALSE C
## 4 0.3964216 FALSE E
## 5 0.9197226 FALSE B