## D.3 Data Frame Subsetting

### D.3.1 Each Data Frame is a List

First of all, we should note that each data frame is in fact represented as an ordinary named list:

class(x)
## [1] "data.frame"
typeof(x)
## [1] "list"

Each column is stored as a separate list item. Having said that, we shouldn’t be surprised that we already know how to perform quite a few operations on data frames:

length(x) # number of columns
## [1] 3
names(x)  # column labels
## [1] "u" "v" "w"
x$u # accessing column u (synonym: x[["u"]]) ## [1] 0.1815171 0.9197226 0.3117235 0.0641516 0.3964216 x[[2]] # 2nd column ## [1] TRUE FALSE FALSE TRUE FALSE x[c(1,3)] # a sub-data.frame ## u w ## 1 0.1815171 A ## 2 0.9197226 B ## 3 0.3117235 C ## 4 0.0641516 D ## 5 0.3964216 E sapply(x, class) # apply class() on each column ## u v w ## "numeric" "logical" "factor" ### D.3.2 Each Data Frame is Matrix-like Data frames can be considered as “generalised” matrices. Therefore, operations such as subsetting will work in the same manner. dim(x) # number of rows and columns ## [1] 5 3 x[1:2,] # first two rows ## u v w ## 1 0.1815171 TRUE A ## 2 0.9197226 FALSE B x[,c(1,3)] # 1st and 3rd column ## u w ## 1 0.1815171 A ## 2 0.9197226 B ## 3 0.3117235 C ## 4 0.0641516 D ## 5 0.3964216 E x[,1] ## [1] 0.1815171 0.9197226 0.3117235 0.0641516 0.3964216 x[,1,drop=FALSE] ## u ## 1 0.1815171 ## 2 0.9197226 ## 3 0.3117235 ## 4 0.0641516 ## 5 0.3964216 Take a special note of selecting rows based on logical vectors. For instance, let’s extract all the rows from x for which the values in the column named u are greater than 0.5: x[x$u>0.5, ]
##           u     v w
## 2 0.9197226 FALSE B

Moreover, subsetting based on integer vectors can be used to change the order of rows. Here is how we can sort the rows in x with respect to the values in column u:

(x_sorted <- x[order(x$u),]) ## u v w ## 4 0.0641516 TRUE D ## 1 0.1815171 TRUE A ## 3 0.3117235 FALSE C ## 5 0.3964216 FALSE E ## 2 0.9197226 FALSE B Let’s stress that the programming style we emphasise on here is very transparent. If we don’t understand how a complex operation is being executed, we can always decompose it into smaller chunks that can be studied separately. For instance, as far as the last example is concerned, we can take a look at the manual of ?order and then inspect the result of calling order(x$u).

On a side note, we can re-set the row names by referring to:

row.names(x_sorted) <- NULL
x_sorted
##           u     v w
## 1 0.0641516  TRUE D
## 2 0.1815171  TRUE A
## 3 0.3117235 FALSE C
## 4 0.3964216 FALSE E
## 5 0.9197226 FALSE B