## D.1 Creating Data Frames

Most frequently, we will be creating data frames based on a series of numeric, logical, characters vectors of identical lengths.

x <- data.frame(
u=runif(5),
v=sample(c(TRUE, FALSE), 5, replace=TRUE),
w=LETTERS[1:5]
)
print(x)
##           u     v w
## 1 0.1815171  TRUE A
## 2 0.9197226 FALSE B
## 3 0.3117235 FALSE C
## 4 0.0641516  TRUE D
## 5 0.3964216 FALSE E

Note that when we create objects of type data frame, strings are automatically converted to factors.

class(x$w) ##  "factor" Throughout the history of computing with R, this has caused way too many bugs (recall, for instance, what’s the result of calling as.numeric() on a factor). In order to change this behaviour, either pass stringsAsFactors=FALSE argument to data.frame() or switch this feature off globally (recommended): options(stringsAsFactors=FALSE) Some objects, such as matrices, can easily be coerced to data frames: (A <- matrix(1:12, byrow=TRUE, nrow=3, dimnames=list( NULL, # row labels c("x", "y", "z", "w") # column labels ))) ## x y z w ## [1,] 1 2 3 4 ## [2,] 5 6 7 8 ## [3,] 9 10 11 12 as.data.frame(A) ## x y z w ## 1 1 2 3 4 ## 2 5 6 7 8 ## 3 9 10 11 12 Named lists are amongst other candidates for conversion: (l <- lapply(split(iris$Sepal.Length, iris$Species), function(x) { c(min=min(x), median=median(x), mean=mean(x), max=max(x)) })) ##$setosa
##    min median   mean    max
##  4.300  5.000  5.006  5.800
##
## $versicolor ## min median mean max ## 4.900 5.900 5.936 7.000 ## ##$virginica
##    min median   mean    max
##  4.900  6.500  6.588  7.900
as.data.frame(l)
##        setosa versicolor virginica
## min     4.300      4.900     4.900
## median  5.000      5.900     6.500
## mean    5.006      5.936     6.588
## max     5.800      7.000     7.900