Blocks are usually key in defining functions. Let’s generalize the
above code in the Loops lesson, writing a function that replaces 0s by
NAs in specified columns in general data frames, not just pima as
before.
1 zerosToNAs <- function(d,cols)
2 {
3 for (j in cols) {
4 NArows <- which(d[,j] == 0)
5 d[NArows,j] <- NA
6 }
7 d
8 }(We’ve added line numbers to this display for convenence.)
Here the formal argument d is the data frame to be worked on, and
cols specifies the columns in which 0s are to be replaced.
The loop goes through d, one column at a time. Since d[,j] == 0
means all of column j of d, then which(d[,j] == 0) will give
us the indices in that column of elements that are 0s. Those indices in
turn mean row numbers in d, which we’ve named NArows. In line
5, then, we replace the 0s we’ve found in column j by NAs. Before
continuing, work through this little example in your mind:
> d <- data.frame(x=c(1,0,3),y=c(0,0,13))
> d
x y
1 1 0
2 0 0
> which(d[,2] == 0)
[1] 1 2 # ah yes; the 0 elements in column 2 are at indices 1 and 2Returning the the above loop code, note that when we reach line 7, we’ve
already finished the loop, and exited from it. So, we are ready to
return the new value of d. Recall that we could do this via the
expression return(d), but we can save ourselves some typing by
simply writing d. That value becomes the last value computed, and R
automatically returns that last value.
We could use this in the Pima data:
> pima <- zerosToNAs(pima,2:6)There is an important subtlety here. All of this will produce a new
data frame, rather than changing pima itself. That does look odd;
isn’t d changing, and isn’t d the same as pima? Well, no;
d is only a separate copy of pima. So, when d changes,
pima does not. So, if we want pima to change, we must reassign
the output of the function back to pima, as we did above.
Your Turn: Write a function with call form
countNAs(dfr), which prints the numbers of NAs in each column of the data framedfr. You’ll need to use the built-inis.na()functon; executeis.na(c(5,NA,13,28,NA))at the R command prompt to see what it does. Test it on a small artificial dataset that you create.