Lesson 27: The ggplot2 Graphics Package

Now, on to ggplot2.

The ggplot2 package was written by Hadley Wickham, who later became Chief Scientist at RStudio. It’s highly complex, with well over 400 functions, and rather abstract, but quite powerful. We will touch on it at various points in this tutorial, while staying with base-R graphics when it is easier to go that route.

Now to build up to using ggplot2, let’s do a bit more with base-R graphics first, continuing with our weight/age investigation of the ballplayers. To begin, let’s do a scatter plot of weight against age, color-coded by position. We could type

> plot(mlb$Age,mlb$Weight,col=mlb$PosCategory)

but to save some typing, let’s use R’s with function (we’ll change the point size while we are at it):

> with(mlb,plot(Age,Weight,col=PosCategory,cex=0.6))

By writing with, we tell R to take Age, Weight and PosCategory in the context of mlb.

alt text

Here is how we can do it in ggplot2:

First, I make an empty plot, based on the data frame mlb:

> p <- ggplot(mlb)

Nothing will appear on the screen. The package displays only when you “print” the plot:

> p

This will just display an empty plot. (Try it.) By the way, recall that any expression you type, even 1 + 1, will be evaluated and printed to the screen. Here the plot (albeit) empty is printed to the screen.

Now let’s do something useful:

> p + geom_point(aes(x = Age, y = Weight, col = PosCategory),cex=0.6)

alt text

What happened here? Quite a bit, actually, so let’s take this slowly.

One nice thing is that we automatically got a legend printed to the right of the graph, so we know which color corresponds to which position. We can do this in base-R graphics too, but need to set an argument for it in plot.