Chapter 6 Functions

This chapter will explore how to use functions in R to perform advanced capabilities and actually ask questions about data. After considering a function in an abstract sense, it will discuss using built-in R functions, accessing additional functions by loading R packages, and writing your own functions.

6.1 What are Functions?

In a broad sense, a function is a named sequence of instructions (lines of code) that you may want to perform one or more times throughout a program. They provide a way of encapsulating multiple instructions into a single “unit” that can be used in a variety of different contexts. So rather than needing to repeatedly write down all the individual instructions for “make a sandwich” every time you’re hungry, you can define a MakeSandwich() function once and then just call (execute) that function when you want to perform those steps.

In addition to grouping instructions, functions in programming languages like R also tend to follow the mathematical definition of functions, which is a set of operations (instructions!) that are performed on some inputs and lead to some outputs. Function inputs are called arguments or parameters, and we say that these arguments are passed to a function (like a football). We say that a function then returns an ouput to use. For example, imagine a function that can determine the largest number in a set of numbers. That function’s input would be the set of numbers, and the output would be the largest number in the set.

6.1.1 R Function Syntax

R functions are referred to by name (technically, they are values like any other variable). As in many programming languages, we call a function by writing the name of the function followed immediately (no space) by parentheses (). Inside the parentheses, we put the arguments (inputs) to the function separated by commas (,). Thus computer functions look just like multi-variable mathematical functions, but with names longer than f().

# call the print() function, pass it "Hello world" value as an argument
print("Hello world")  # "Hello world"

# call the sqrt() function, passing it 25 as an argument
sqrt(25)  # 5, square root of 25

# call the min() function, pass it 1, 6/8, AND 4/3 as arguments
# this is an example of a function that takes multiple args
min(1, 6/8, 4/3)  # 0.75, (6/8 is the smallest value)
  • Note: To keep functions and variables distinct, we try to always include empty parentheses () when referring to a function by name. This does not mean that the function takes no arguments, it is just a useful shorthand for indicating that something is a function.

If you call any of these functions interactively, R will display the returned value (the output) in the console. However, the computer is not able to “read” what is written in the console—that’s for humans to view! If you want the computer to be able to use a returned value, you will need to give that value a name so that the computer can refer to it. That is, you need to store the returned value in a variable:

# store min value in smallest.number variable
smallest.number <- min(1, 6/8, 4/3)

# we can then use the variable as normal, such as for a comparison
min.is.big <- smallest.number > 1  # FALSE

# we can also use functions directly when storing to variables
phi <- .5 + sqrt(5)/2  # 1.618...

# we can even pass the result of a function as an argument to another!
# watch out for where the parentheses close!
print(min(1.5, sqrt(3)))  # prints 1.5
  • In the last example, the resulting value of the “inner” function (e.g., sqrt()) is immediately used as an argument. Because that value is used immediately, we don’t have to assign it a separate variable name. It is thus known as an anonymous variable.

6.2 Built-in R Functions

As you have likely noticed, R comes with a variety of functions that are built into the language. In the above example, we used the print() function to print a value to the console, the min() function to find the smallest number among the arguments, and the sqrt() function to take the square root of a number. Here is a very limited list of functions you can experiment with (or see a few more here).

Function Name Description Example
sum(a,b,...) Calculates the sum of all input values sum(1, 5) returns 6
round(x,digits) Rounds the first argument to the given number of digits round(3.1415, 3) returns 3.142
toupper(str) Returns the characters in uppercase toupper("hi there") returns "HI THERE"
paste(a,b,...) Concatenate (combine) characters into one value paste("hi", "there") returns "hi there"
nchar(str) Counts the number of characters in a string nchar("hi there") returns 8 (space is a character!)
c(a,b,...) Concatenate (combine) multiple items into a vector (see chapter 7) c(1, 2) returns 1, 2
seq(a,b) Return a sequence of numbers from a to b seq(1, 5) returns 1, 2, 3, 4, 5

To learn more about any individual function, look them up in the R documentation by using ?FunctionName account as described in the previous chapter.

“Knowing” how to program in a language is to some extent simply “knowing” what provided functions are available in that language. Thus you should look around and become familiar with these functions… but do not feel that you need to memorize them! It’s enough to simply be aware “oh yeah, there was a function that sums up numbers”, and then be able to look up the name and argument for that function.

6.3 Loading Functions

Although R comes with lots of built-in functions, you can always use more functions! Packages (or libraries) are additional sets of R functions that are written and published by the R community. Because many R users encounter the same data management/analysis challenges, programmers are able to use these libraries and thus benefit from the work of others (this is the amazing thing about the open-source community—people solve problems and then make those solutions available to others). R packages do not ship with the R software by default, and need to be downloaded (once) and then loaded into your interpreter’s environment (each time you wish to use them). While this may seem cumbersome, the R software would be huge and slow if you had to install and load all available packages to use it.

Luckily, it is quite simple to install and load R packages from within R. To do so, you’ll need to use the built-in R functions install.packages and library. Below is an example of installing and loading the stringr package (which contains more handy functions for working with character strings):

# Install the `stringr` package. Only needs to be done once on your machine
install.packages("stringr")

# Load the package (make stringr() functions available in this R session/program)
library("stringr") # quotes optional here
  • Note that when you load a package, you may receive a warning message about the package being built under a previous version of R. In all likelihood this shouldn’t cause a problem, but you should pay attention to the details of the messages and keep them in mind (especially if you start getting unexpected errors).

After loading the package with the library() function, you have access to functions that were written as part of that package (see the documentation for a list of functions included with the stringr library).

6.4 Writing Functions

Even more exciting than loading other peoples’ functions is writing your own. Any time that you have a task that you may repeat throughout a script—or you simply want to organize your thinking—it’s good practice to write a function to perform that task. This will limit repetition and reduce the likelihood of errors… as well as make things easier to read and understand (and thus identify flaws in your analysis).

Functions are named like any other variable, so we use the assignment operator (<-) to store a new function in a variable. It is best practice to assign functions names in CamelCase without any periods (.) in the name. This helps distinguish functions from other variables.

The best way to understand the syntax for defining a function is to look at an example:

# A function named `MakeFullName` that takes two arguments
# and returns the "full name" made from them
MakeFullName <- function(first.name, last.name) {
  # Function body: perform tasks in here
  full.name <- paste(first.name, last.name)

  # Return: what you want the function to output
  return(full.name)
}

# Call the MakeFullName function with the values "Alice" and "Kim"
my.name <- MakeFullName("Alice", "Kim")  # "Alice Kim"

Functions have a couple of pieces to them:

  • Arguments: the value assigned to the function variable uses the syntax function(...) to indicate that you are creating a function (as opposed to a number or character string). The values put betweeen the parentheses are variables that will contain the values passed in as arguments. For example, when we call MakeFullName("Alice", "Kim"), the value of the first argument ("Alice") will be assigned to the first variable (first.name), and the value of the second argument ("Kim") will be assigned to the second variable (last.name).

    Importantly, we could have made the argument names anything we wanted (name.first, given.name, etc.), just as long as we then use that variable name to refer to the argument while inside the function. Moreover, these argument variable names only apply while inside the function. You can think of them like “nicknames” for the values. The variables first.name, last.name, and full.name only exist within this particular function.

  • Body: The body of the function is a block of code that falls between curly braces {} (a “block” is represented by curly braces surrounding code statements). Note that cleanest style is to put the opening { immediately after the arguments list, and the closing } on its own line.

    The function body specifies all the instructions (lines of code) that your function will perform. A function can contain as many lines of code as you want—you’ll usually want more than 1 to make it worth while, but if you have more than 20 you might want to break it up into separate functions. You can use the argument variables in here, create new variables, call other functions… basically any code that you would write outside of a function can be written inside of one as well!

  • Return value: You can specify what output a function produces by calling the return() function and passing that the value that you wish your function to return (output). The return() function will execute instructions that end the current function and return the flow of code execution to wherever this function was called from. Note that even though we returned a variable called full.name, that variable was local to the function and so doesn’t exist outside of it; thus we have to take the returned value and assign it to a new variable (as with name <- MakeFullName("Alice", "Kim")).

    Because the return() call exits the function, it is usually the last line of code in the function.

We can call (execute) a function we defined the same way we called built-in functions. When we do so, R will take the arguments we passed in (e.g., "Alice" and "Kim") and assign them to the argument variables. Then it executes each line of code in the function body one at a time. When it gets to the return() call, it will end the function and return the given value, which can then be assigned to a different variable outside of the functions.

6.5 Conditional Statements

Functions are a way to organize and control the flow of execution (e.g., what lines of code get run in what order). In R, as in other languages, we have one other way of controlling program flow, and that is by specifying different instructions that can be run based on a different set of conditions. Conditional statements allow us to specify different chunks of code to run when given different contexts, which is often valuable within functions.

In an abstract sense, an conditional statement is saying:

IF something is true
  do some lines of code
OTHERWISE
  do some other lines of code

In R, we write these conditional statements using the keywords if and else and the following syntax:

if(condition){
  # lines of code to run if condition is TRUE
} else {
  # lines of code to run if condition is FALSE
}

(Note that the the else needs to be on the same line as the closing } of the if block. It is also possible to omit the else and its block).

The condition can be any variable or expression that resolves to a logical value (TRUE or FALSE). Thus both of the below conditional statements are valid:

porridge.temp <- 115  # in degrees F
if(porridge.temp > 120) {
  print("This porridge is too hot!")
}

too.cold <- porridge.temp < 70
if(too.cold) {  # a logical value
  print("This porridge is too cold!")
}

Note, we can extend the set of conditions evaluated using an else if statement. For example:

# Function to determine if you should eat porridge
FoodTempTest <- function(temp) {
  if(temp > 120) {
    status <- "This porridge is too hot!"
  } else if(temp < 70) {
    status <- "This porridge is too cold!"
  } else {
    status <- "This porridge is just right!"
  }
  return(status)
}
# Use funciton on different temperatures
FoodTempTest(119)  # "This porridge is just right!"
FoodTempTest(60)   # "This porridge is too cold!"
FoodTempTest(150)  # "This porridge is too hot!"