Lesson 23: S3 classes

Tip: Remember, the point of computers is to alleviate us of work. We should avoid doing what the computer could do. For instance, concerning the graph in the last lesson: We had typed

> abline(181.4366,0.6936)

but we really shouldn’t have to type those numbers in by hand — and we don’t have to. Here’s why:

As mentioned earlier, R is an object-oriented language. Everthing is an object, and every object has a class. One of the most common class structures is called S3.

When we call lm, the latter returns an S3 object of lm class:

> lmout <- lm(Weight ~ Age,data=mlb)
> class(lmout)
[1] "lm"

A handy way to take a quick glance at the contents of an object is str:

> str(lmout)
List of 12
 $ coefficients : Named num [1:2] 181.437 0.694
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "Age"
...
...
 - attr(*, "class")= chr "lm"

Our use of … here is to indicate that we’ve omitted a lot of the output. But a couple of things stand out even in this excerpt:

  1. Our lmout object here is an R list (which is typical of S3 objects). That R list here has 12 elements.

  2. But it has an extra attribute, which is the class name, in this case lm. (So the designers of R simply chose to name the class after the function, which is not always the case.)

  3. The first of the elements of this R list is named coefficients, and it is a vector containing the slope and intercept.

So, we don’t have to type the slope and intercept in by hand after all.

> cfs <- lmout$coefficients
> abline(a = cfs[1], b = cfs[2])

By the way, abline() is actually a generic function, like print() and plot(). So if we want to be clever, we can add our line to the graph using this approach:

> abline(lmout)

Now, what about our original question — do baseball players gain weight as they age? The answer appears to be yes; for each additional year of age, the estimated mean age increases by about 0.7 pound. That’s about 7 pounds in 10 years, rather remarkable.

Again, this is only an estimate — 181.437 and 0.694 are estimates of the unknown population values β0 and β1. — generated from sample data. We can get an idea of the accuracy of this estimate by calculating a confidence interval, but we’ll leave that for statistics courses.

But we can do more right now. One might ask, Shouldn’t we also account for a player’s height, not just his age? After all, taller people tend to be heavier. Yes, we should do this:

> lmo <- lm(Weight ~ Height + Age, data=mlb)
> lmo

Call:
lm(formula = Weight ~ Height + Age, data = mlb)

Coefficients:
(Intercept)       Height          Age  
  -187.6382       4.9236       0.9115  

Here we instruct R to find the estimated regression function of weight, using both height and age as predictors. The + doesn’t mean addition; it is simply a delimiter between the predictors height and age in our regression specification.

So the new model is

mean weight = β0 + β1 height + β2 age

This says:

estimated mean weight = -187.6382 + 4.9236 height + 0.9115 age

So, under this more refined analysis, things are even more pessimistic; players on average gain about 0.9 pounds per year. And by the way, an extra inch of height corresponds on average to about 4.9 pounds of extra weight; taller players are indeed heavier, as we surmized.

Warning: Though this is not a statistics tutorial per se, an important point should be noted. Regression analysis has two goals, Description and Prediction. Our above analysis was aimed at the former — we want to describe the nature of fitness issues in pro baseball players. As we saw, a coefficient can change quite a lot when another predictor is added to the model, and in fact can even change sign (“Simpson’s Paradox”). Suppose for instance the shorter players tend to have longer careers. If we do not include height in our model, that omission might bias the age coefficient downward. Thus great care must be taken in interpreting coefficients in the Description setting. For Prediction, it is not as much of an issue.

Your Turn: In the mtcars data, fit a linear model of the regression of MPG against weight and horsepower; what is the estimated effect of 100 pounds of extra weight, for fixed horsepower?