Lesson 16: Writing Your Own Functions

We’ve seen a number of R’s built-in functions so far, but here comes the best part — you can write your own functions.

Recall a line we had earlier:

> sum(Nile > 1200)

This gave us the count of the elements in the Nile data larger than 1200.
Now, say we want the mean of those elements:

> gt1200 <- which(Nile > 1200)
> nileSubsetGT1200 <- Nile[gt1200]
> mean(nileSubsetGT1200)
[1] 1250

As before, we could instead write a more compact version,

> mean(Nile[Nile > 1200])
[1] 1250

But it’s best to do it step by step at first. Let’s see how those steps work. Writing the code with line numbers for reference, the code is

1  gt1200Indices <- which(Nile > 1200)
2  nileSubsetGT1200 <- Nile[gt1200Indices]
3  mean(nileSubsetGT1200)

Let’s review how this works:

But we may wish to do this kind thing often, on many datasets etc. Then we have:

Tip: If we have an operation we will use a lot, we should consider writing a function for it.

Say we want to do the above again, but with 1350 instead of 1200. Or, with the tg$len vector from our ToothGrowth example, with 10.2 as our lower bound. We could keep typing the same pattern as above, but if we’re going to do this a lot, it’s better to write a function for it:

Here is our function:

> mgd <- function(x,d) mean(x[x > d])

Here I’ve used a compact form for convenience. (Otherwise I’d need to use blocks to be covered in a later lesson.) I named it mgd for “mean of elements greater than d,” but any name is fine.

Let’s try it out, then explain:

> mgd(Nile,1200)
[1] 1250
> mgd(tg$len,10.2)
[1] 21.58125

This saved me typing. In the second call, I would have had to type

mean(tg$len[tg$len > 10.2])

considerably longer. But even more importantly, I’d have to think about the operation each time I used it; by making a function out of it, I’ve got it ready to go, all debugged, whenever I need it.

So, how does all this work? Again, look at the code:

> mgd <- function(x,d) mean(x[x > d])
> class(mgd)
[1] "function"

There is a lot going on here. Bear with me for a moment, as I bring in a little of the “theory” of R:

Odd to say, but there is a built-in function in R itself named ‘function’! We’ve already seen several built-in R functions, e.g. mean(), sum() and plot(). Well, here is another, function(). We’re calling it here. And its job is to build a function. Yes, as I like to say, to my students’ amusement,

“The function of the function named function is to build functions! And the class of object returned by function is function!”

So, in the line

> mgd <- function(x,d) mean(x[x > d])

we are telling R, “R, I want to write my own function. I’d like to name it mgd; it will have arguments x and d, and it will do mean(x[x > d]). Please build the function for me. Thanks in advance, R!”

Here we called function to build a function object, and then assigned to mgd. We can then call the latter, as we saw above, repeated here for convenience:

> mgd(Nile,1200)
[1] 1250

In executing

> mgd <- function(x,d) mean(x[x > d])

x and d are known as formal arguments, as they are just placeholders. For example, in

> mgd(Nile,1200)

we said, “R, please execute mgd with Nile playing the role of x, and 1200 playing the role of d. Here Nile and 1200 are known as the actual arguments.

As with variables, we can pretty much name functions and their arguments as we please.

As you have seen with R’s built-in functions, a function will typically have a return value. In our case here, we could arrange that by writing

> mgd <- function(x,d) return(mean(x[x > d]))

a bit more complicated than the above version. But the call to return is not needed here, because in any function, R will return the last value computed, in this case the requested mean.

And we can save the function for later use. One way to do this is to call R’s save function, which can be used to save any R object:

> save(mgd,file='mean_greater_than_d')

The function has now been saved in the indicated file, which will be in whatever folder R is running in right now. We can leave R, and say, come back tomorrow. If we then start R from that same folder, we then run

> load('mean_greater_than_d')

and then mgd will be restored, ready for us to use again. (Typically this is not the way people save code, but this is the subject of a later lesson.)

Let’s write another function, this one to find the range of a vector, i.e. the difference between the minimal and maximal values:

> rng <- function(y) max(y) - min(y)
> rng(Nile)
[1] 914

Here we made use of the built-in R functions max and min.

Tip: Build new functions from old ones (which may in turn depend on other old ones, etc.).

Again, the last item computed is the subtraction, so it will be automatically returned, just what we want. As before, I chose to name the argument y, but it could be anything. However, I did not name the function range, as there is already a built-in R function of that name.

Your Turn: Try your hand at writing some simple functions along the lines seen here. You might start by writing a function cgd(), like mgd() above, but returning the count of the number of elements in x that are greater than d. Then may try writing a function n0(x), that returns the number of 0s in the vector x. (Hint: Make use of R’s == and sum.) Another suggestion would be a function hld(x,d), which draws a histogram for those elements in the vector x that are less than d. Write at least 4 or 5 functions; the more you write, the easier it will be in later lessons.

Functions are R objects, just as are vectors, lists and so on. Thus, we can print them by just typing their names!

> mgd <- function(x,d) mean(x[x > d])
> mgd
function(x,d) mean(x[x > d])