Tip: Remember, the point of computers is to alleviate us of work. We should avoid doing what the computer could do. For instance, concerning the graph in the last lesson: We had typed
> abline(181.4366,0.6936)
but we really shouldn’t have to type those numbers in by hand — and we don’t have to. Here’s why:
As mentioned earlier, R is an object-oriented language. Everthing is
an object
, and every object has a class
. One of the most common
class structures is called S3
.
When we call lm
, the latter returns an S3 object of lm
class:
> lmout <- lm(Weight ~ Age,data=mlb)
> class(lmout)
[1] "lm"
A handy way to take a quick glance at the contents of an object is str:
> str(lmout)
List of 12
$ coefficients : Named num [1:2] 181.437 0.694
..- attr(*, "names")= chr [1:2] "(Intercept)" "Age"
...
...
- attr(*, "class")= chr "lm"
Our use of … here is to indicate that we’ve omitted a lot of the output. But a couple of things stand out even in this excerpt:
Our lmout
object here is an R list (which is typical of S3
objects). That R list here has 12 elements.
But it has an extra attribute
, which is the class name, in this
case lm
. (So the designers of R simply chose to name the class
after the function, which is not always the case.)
The first of the elements of this R list is named coefficients
, and it
is a vector containing the slope and intercept.
So, we don’t have to type the slope and intercept in by hand after all.
> cfs <- lmout$coefficients
> abline(a = cfs[1], b = cfs[2])
By the way, abline()
is actually a generic
function, like
print()
and plot()
. So if we want to be clever, we can add our
line to the graph using this approach:
> abline(lmout)
Now, what about our original question — do baseball players gain weight as they age? The answer appears to be yes; for each additional year of age, the estimated mean age increases by about 0.7 pound. That’s about 7 pounds in 10 years, rather remarkable.
Again, this is only an estimate — 181.437 and 0.694 are estimates of the unknown population values β0 and β1. — generated from sample data. We can get an idea of the accuracy of this estimate by calculating a confidence interval, but we’ll leave that for statistics courses.
But we can do more right now. One might ask, Shouldn’t we also account for a player’s height, not just his age? After all, taller people tend to be heavier. Yes, we should do this:
> lmo <- lm(Weight ~ Height + Age, data=mlb)
> lmo
Call:
lm(formula = Weight ~ Height + Age, data = mlb)
Coefficients:
(Intercept) Height Age
-187.6382 4.9236 0.9115
Here we instruct R to find the estimated regression function of weight,
using both height and age as predictors. The +
doesn’t mean addition; it
is simply a delimiter between the predictors height and age in our regression
specification.
So the new model is
mean weight = β0 + β1 height + β2 age
This says:
estimated mean weight = -187.6382 + 4.9236 height + 0.9115 age
So, under this more refined analysis, things are even more pessimistic; players on average gain about 0.9 pounds per year. And by the way, an extra inch of height corresponds on average to about 4.9 pounds of extra weight; taller players are indeed heavier, as we surmized.
Warning: Though this is not a statistics tutorial per se, an important point should be noted. Regression analysis has two goals, Description and Prediction. Our above analysis was aimed at the former — we want to describe the nature of fitness issues in pro baseball players. As we saw, a coefficient can change quite a lot when another predictor is added to the model, and in fact can even change sign (“Simpson’s Paradox”). Suppose for instance the shorter players tend to have longer careers. If we do not include height in our model, that omission might bias the age coefficient downward. Thus great care must be taken in interpreting coefficients in the Description setting. For Prediction, it is not as much of an issue.
Your Turn: In the
mtcars
data, fit a linear model of the regression of MPG against weight and horsepower; what is the estimated effect of 100 pounds of extra weight, for fixed horsepower?