Lesson 35: Files and Folders/Directories

Note: On Unix-family systems such as Linux, the Windows term folder is said to be a directory. You will frequently see this in Mac discussions as well. (The Mac OS is a Unix-family system.) We will typically use the term directory here, as that is what R uses.

In assmebling a dataset for my regtools package, I needed to collect the records of several of my course offerings. I started in a directory that had one subdirectory for each offering. In turn, there was a file named Results. As an intermediate step, wanted to find all such files, placing the text for each one in an R list gFiles. Only some specific columns of each file will be retained. (The discussion here is a slightly adapted version.)

The chief R functions I used were:

Here is the code:

getData <- function() {

   currDir <- getwd()  # leave a trail of bread crumbs

   dirs <- list.dirs(recursive=FALSE)
   numCourseOfferings <- 0
   # create empty R list, into which we'll store our course records
   resultsFiles <- list()
   for (d in dirs) {
      setwd(d)  # descend into d directory
      # check if there is a Results file there
      fls <- dir()
      if (!('Results' %in% fls)) {  # not there, skip this dir
         setwd(currDir)
         next
      }
      # ah, there is such a file; increment our count
      numCourseOfferings <- numCourseOfferings + 1
      # open it
      resultsLines <- readLines('Results')
      # delete the comment lines; look at 1st character in each line
      resultsLines <- delComments(resultsLines)
      resultsFiles[[numCourseOfferings]] <- extractCols(resultsLines)
      setwd(currDir)
   }
   resultsFiles  # return all the grades records
}

Before we go into the details, note the following:

Now, consider the line

   dirs <- list.dirs(recursive=FALSE)

As mentioned, list.dirs() will determine all the subdirectories within the current directory. But what about subdirectories of subdirectories, and subdirectories of subdirectories of subdirectories, and so on? Setting recursive to FALSE means we want only first-level subdirectories.

So, the line

   for (d in dirs) {

will then have us process each (first-level) directory, one by one.

When we enter one of those subdirectories, the line

      fls <- dir()

will determine all the files there, storing the result as a character vector fls.

Then, as the comment notes, the lines

      if (!('Results' %in% fls)) {  # not there, skip this dir
         setwd(currDir)
         next
      }

will, in the event that there is no Results file in this subdirectory, skip this subdirectory. The R keyword next says, “Go to the next iteration of this loop,” which here means to process the next subdirectory. Note that to prepare for that, we need to move back to the original directory:

         setwd(currDir)

On the other hand, if this subdirectory does contain a file named Results, the remaining code increments our count of such files, reads in the found file, and assigns its contents as a new element of our resultsFiles list.