Each object in R has a class. The number 3 is of the numeric
class, the character string abc is of the character class, and
so on. (In R, class names are quoted; one can use single or double
quotation marks.) Note that vectors of numbers are of numeric
class too; actually, a single number is considered to be a vector of
length 1. So, c('abc','xw'), for instance, is character
as well.
Tip: Computers require one to be very, very careful and very, very precise. In that expression
c('abc','xw')above, one might wonder why it does not evaluate toabcxw. After all, didn’t I say that thecstands for “concatenate”? Yes, but thecfunction concatenatesvectors. Hereabcis a vector of length 1 — we haveonecharacter string, and the fact that it consists of 3 characters is irrelevant — and likewisexwis one character string. So, we are concatenating a 1-element vector with another 1-element vector, resulting in a 2-element vector.
What about tg and tg$supp in the Vitamin C example above? What
are their classes?
> class(tg)
[1] "data.frame"
> class(tg$supp)
[1] "factor"R factors are used when we have categorical variables. If in a
genetics study, say, we have a variable for hair color, that might
comprise four categories: black, brown, red, blond. We can find the
list of categories for tg$supp as follows:
> levels(tg$supp)
[1] "OJ" "VC"The categorical variable here is supp, the name the creator of this
dataset chose for the supplement column. We see that there are two categories
(levels), either orange juice or Vitamin C.
Note carefully that the values of an R factor must be quoted. Either
single or double quote marks is fine (though the marks don’t show
up when we use head).
Factors can sometimes be a bit tricky to work with, but the above is enough for now. Let’s see how to apply the notion in the current dataset.