Continuing along the Nile, say we would like to know in how many years
the level exceeded 1200. Let’s first introduce R’s sum function:
> sum(c(5,12,13))
[1] 30Here the c function built a vector consisting of
5, 12 and 13. That vector was then fed into the sum function,
returning 5+12+13 = 30.
By the way, the above is our first example of function composition,
where the output of one function, c here, is fed as input into
another, sum in this case.
We can now use this to answer our question on the Nile data:
> sum(Nile > 1200)
[1] 7The river level exceeded 1200 in 7 years.
But how in the world did that work? Bear with me a bit here. Let’s look at a small example first:
> x <- c(5,12,13)
> x
[1] 5 12 13
> x > 8
[1] FALSE TRUE TRUE
> sum(x > 8)
[1] 2First, notice something odd here, in the expression x > 8. Here
x is a vector, 3 elements in length, but 8 is just a number. It
would seem that it’s nonsense to ask whether a vector is greater than a
number; they’re different animals.
But R makes them “the same kind” of animal, by extending that number 8
to a 3-element vector 8,8,8. This is called recycling. This sets up
an element-by-element comparison: Then, the 5 in x is compared to
the first 8, yielding FALSE i.e. 5 is NOT greater than 8. Then 12 is
compared to the second 8, yielding TRUE, and then the comparison of 13
to the third 8 yields another TRUE. So, we get the vector
FALSE, TRUE, TRUE.
Fine, but how will sum add up some TRUEs and FALSEs? The
answer is that R, like most computer languages, treats TRUE and FALSE as
1 and 0, respectively. So we summed the vector (0,1,1), yielding 2.
Getting back to the question of the number of years in which the Nile flow exceeded 1200, let’s look at that expression again:
> sum(Nile > 1200)Since the vector Nile has length 100, that number 1200 will be
recycled into a vector of one hundred copies of 1200. The >
comparison will then yield 100 TRUEs and FALSEs, so summing gives us the
number of TRUEs, exactly what we want.
Your Turn: Try a few other experiments of your choice using
sum. I’d suggest starting with finding the sum of the first 25 elements inNile. You may wish to start with experiments on a small vector, say (2,1,1,6,8,5), so you will know that your answers are correct. Remember, you’ll learn best nonpassively. Code away!
A question related to how many years had a flow above 1200 is which
years had that property. Well, R actually has a which function:
> which(Nile > 1200)
[1] 4 8 9 22 24 25 26So the 4th, 8th, 9th etc. elements in Nile had the queried property.
(Note that those were years 1875, 1879 and so on.)
In fact, that gives us another way to get the count of the years with that trait:
> which1200 <- which(Nile > 1200)
> which1200
[1] 4 8 9 22 24 25 26
> length(which1200)
[1] 7Of course, as usual, my choice of the variable name which1200 was
arbirary, just something to help me remember what is stored in that
variable.
R’s length function does what it says, i.e. finding the length of a
vector. In our context, that gives us the count of years with flow
above 1200.
And, what were the river flows in those 7 years?
> which1200 <- which(Nile > 1200)
> Nile[which1200]
[1] 1210 1230 1370 1210 1250 1260 1220Finally, something a little fancier. We can combine steps above:
> Nile[Nile > 1200]
[1] 1210 1230 1370 1210 1250 1260 1220We just “eliminated the middle man,” which1200. The R interpreter
saw our Nile > 1200, and thus generated the corresponding TRUEs and
FALSEs. The R interpreter then treated those TRUEs and
FALSEs as subscripts in Nile, thus extracting the desired data.
Now, we might add here, “Don’t try this at home, kids.” For beginners, it’s really easier and more comfortable to break things into steps. Once, you become experienced at R, you may wish to start skipping steps.
Less bold is the notion of negative indices, e.g.
> x <- c(5,12,13,8)
> x[-1]
[1] 12 13 8Your Turn: Here we are asking for all of
xexcept forx[1]. Can you guess whatx[c(-1,-4)]evaluates to? Guess first, then try it out.
Here you’ve refined your skillset for R vectors, learning R’s recycling feature, and two tricks that R users employ for finding counts of things.
Once again, as you progress through this tutorial, you’ll see that these things are used a lot in R.