fasteR

Lesson 11: The R List Class

We saw earlier how handy the tapply function can be. Let’s look at a related one, split.

Earlier we mentioned the built-in dataset mtcars, a data frame. Consider mtcars$mpg, the column containing the miles-per-gallon data. Again, to save typing and avoid clutter in our code, let’s make a copy first:

> mtmpg <- mtcars$mpg

Suppose we wish to split the original vector into three vectors, one for 4-cylinder cars, one for 6 and one for 8. We could do

> mt4 <- mtmpg[mtcars$cyl == 4]

and so on for mt6 and mt8.

Your Turn: In order to keep up, make sure you understand how that line of code works, with the TRUEs and FALSEs etc. First print out the value of mtcars$cyl == 4, and go from there.

But there is a cleaner way:

> mtl <- split(mtmpg,mtcars$cyl)
> mtl
$`4`
 [1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4

$`6`
[1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7

$`8`
 [1] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 13.3 19.2 15.8 15.0
> class(mtl)
[1] "list"

In English, the call to split said, “Split mtmpg into multiple vectors, with the splitting criterion being the correspond values in mtcars$cyl.”

Now mtl, an object of R class list, contains the 3 vectors. We can access them individually with the dollar sign notation:

> mtl$`4`
 [1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4

Or, we can use indices, though now with double brackets:

> mtl[[1]]
 [1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4

Looking a little closer:

> head(mtcars$cyl)
[1] 6 6 4 6 8 6

We see that the first car had 6 cylinders, so the first element of mtmpg, 21.0, was thrown into the 6 pile, i.e. mtl[[2]] (see above printout of mtl), and so on.

And of course we can make copies for later convenience:

> m4 <- mtl[[1]]
> m6 <- mtl[[2]]
> m8 <- mtl[[3]]

Lists are especially good for mixing types together in one package:

> l <- list(a = c(2,5), b = 'sky')
> l
$a
[1] 2 5

$b
[1] "sky"

Note that here we can give names to the list elements, a and b. In forming mtl using split above, the names were assigned according to the values of the vector beiing split. (In that earlier case, we also needed backquotes , since the names were numbers.)

If we don’t like those default names, we can change them:

> names(mtl) <- c('four','six','eight')
> mtl
$four
 [1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4

$six
[1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7

$eight
 [1] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 13.3 19.2 15.8
15.0

What if we want, say, the MPG for the third car in the 6-cylinder category?

> mtl[[2]][3]
[1] 21.4

The point is that mtl[[2]] is a vector, so if we want element 3 of that vector, we tack on [3].

Or,

> mtl$six[3]
[1] 21.4

By the way, it’s no coincidence that a dollar sign is used for delineation in both data frames and lists; data frames are lists. Each column is one element of the list. So for instance,

> mtcars[[1]]
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

Here we used the double-brackets list notation to get the first element of the list, which is the first column of the data frame.

Your Turn Try using split on the ToothGrowth data, say splitting into groups according to the supplement, and finding various quantities.

← PREVIOUS

Data Cleaning

Another Look at the Nile Data