Chapter 8 Lists

This chapter covers an additional R data type called lists. Lists are somewhat similar to vectors, but can store more types of data and more details about that data (with some cost). Lists are R’s version of a Map, which is a common and extremely useful way of organizing data in a computer program. Moreover: lists are used to create data frames, which is the primary data storage type used for working with sets of real data in R. This chapter will cover how to create and access elements in a list, as well as how to apply functions to lists or vectors.

8.1 What is a List?

A List is a lot like a vector, in that it is a one-dimensional collection of data. However, lists have two main differences from vectors:

  1. Unlike a vector, you can store elements of different types in a list: e.g., a list can contain numeric data and character string data.

  2. Elements in a list can be tagged with with names which you can use to easily refer to them—rather than talking about the list’s “element #1”, we can talk about the list’s “first_name element”.

The second feature is the most significant, as it allows you to use lists to create a type of map. In computer programming, a map (or “mapping”) is a way of associating one value with another. The most common real-world example is a dictionary or encyclopedia: a dictionary associates each word with it’s definition—you can “look up” a definition by using the word itself, rather than needing to look up the 3891st definition in the book. In fact, this same data structure is called a dictionary in the Python programming language!

Lists are extremely useful for organizing data. They allow you to group together data like a person’s name (characters), job title (characters), salary (number), and whether they are in a union (logical)—and you don’t have to remember whether whether the person’s name or title was the first element!

Note: technically, you can create named elements in a vector, but this is very rarely done. Compared to vectors, lists provide a simpler syntax for accessing named elements.

8.1.1 Creating Lists

You create a list by using the list() function and passing it any number of arguments (separated by commas) that you want to make up that list—similar to the c() functon for vectors.

However, you can (and should) specify the tags for each element in the list by putting the name of the tag (which is like a variable name), followed by an equal symbol (=), followed by the value you want to go in the list and be associated with that tag. For example:

person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)

This creates a list of 4 elements: "Ada" which is tagged with first_name, "Programmer" which is tagged with job, 78000 which is tagged with salary, and TRUE which is tagged with in_union.

  • Note that you can have vectors as elements of a list. In fact, each of these scalar values are really vectors (of length 1)!

  • The use of the = symbol here is an example of assigning a value to a specific named argument. You can actually use this syntax for any function (e.g., rather than listing arguments in order, you can explicit “assign” a value to each argument), but it is more common to just use the normal order of the arguments if there aren’t very many.

It is possible to create a list without tagging the elements:

person_alt <- list("Ada", "Programmer", 78000, TRUE)

But it will make code harder to read and more error-prone, so isn’t as common.

8.1.2 Accessing Lists

If you printed out the above person list, you would see the following:

> print(person)
[1] "Ada"

[1] "Programmer"

[1] 78000

[1] TRUE

Notice that the output lists each tag name prepended with a dollar sign ($) symbol, and then on the following line the vector that is the element itself. The $ symbol is one of the easiest ways of accessing list elements (which can’t be done with vectors).

Because list elements are (usually) tagged, you can access them by their tag name rather than by the index number you used with vectors. You do this by using dollar notation: you refer to the element with a particular tag in a list by writing the name of the list, followed by a $, followed by the element’s tag:

person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)

person$first_name  # [1] "Ada"
person$salary  # [1] 78000

(See below for other options for accessing list elements).

You can almost read the dollar sign as like an “apostrophe s” (possessive) in English: so person$salary would mean “the person list’s salary value”.

Dollar notation allows list elements to almost be treated as variables in their own right—for example, you specify that you’re talking about the salary variable in the person list, rather than the salary variable in some other list (or not in a list at all).

person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)

# use elements as function or operation arguments
paste(person$job, person$first_name)   # [1] "Programmer Ada"

# assign values to list element
person$job <- "Senior Programmer"  # a promotion!
print(person$job)  # [1] "Senior Programmer"

# assign value to list element from itself
person$salary <- person$salary * 1.15  # a 15% raise!
print(person$salary)  # [1] 89700

Note that if you need to, you can get a vector of element tags using the names() function:

person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)
names(person)  # [1] "first_name" "job"  "salary"  "in_union"
  • This is useful for understanding the structure of variables that may have come from other data sources.

8.1.3 List Indices

Whether or not a list element has a tag, you can also access it by its numeric index (i.e., if it is the 1st, 2nd, etc. item in the list). You do this by using double-bracket notation: you refer to the element at a particular index of a list by writing the name of the list, followed by double square brackets ([[]]) that contain the index of interest:

# note: a list and not a vector, even though elements have the same types
animals <- list("Aardvark", "Baboon", "Camel")

animals[[1]]  # [1] "Aardvark"
animals[[3]]  # [1] "Camel"
animals[[4]]  # Error: subscript out of bounds!

You can also use double-bracket notation to access an element by its tag if you put a character string (in "") of the tag name inside the brackets. This is particularly useful if you want the tag itself to be a variable!

person <- list(first_name = "Bob", last_name = "Wong", salary = 77000, in_union = TRUE)

person[["first_name"]]  # [1] "Bob"
person[["salary"]]  # [1] 77000

name_to_use <- "last_name"  # choose name (i.e., based on formality)
person[[name_to_use]]  # [1] "Wong"
name_to_use <- "first_name"  # change name to use
person[[name_to_use]]  # [1] "Bob"

# Can use indices for tagged elements as well!
person[[1]]  # [1] "Bob"
person[[4]]  # [1] TRUE Single vs. Double Brackets

Watch out!: vectors use single-bracket notation for accessing by index, but lists use double-bracket notation for accessing by index!

This is because the single-bracket syntax for vectors isn’t actually selecting by index: rather it is filtering by whatever vector is inside the brackets (which may be just a single element: the index number to extract). In R, single brackets always mean to filter the collection. So if you put single-brackets after a list, what you’re actually doing is getting a filtered sub-list of the elements that have those indices, just as single brackets on a vector return a subset of elements in that vector:

my_list <- list('A', 201, TRUE, 'rhinoceros')

# SINGLE brackets returns a list
            # [[1]]
            # [1] "A"

# DOUBLE brackets returns a vector
my_list[[1]]  # [1] "A"

# can use any vector as the argument to single brackets, just like with vectors
            # [[1]]
            # [1] "A"
            # [[2]]
            # [1] 201
            # [[3]]
            # [1] TRUE

In sum, remember that single-brackets gives a list, double-brackets gives a list element. You almost always want to be refering to the value itself (the vector—everything is a vector!) rather than a list, so almost always want to use double-brackets when accessing lists.

8.1.4 Modifying Lists

Similarly to vectors, you can add and modify list elements. However, lists also enable you to remove elements.

You can add elements to a list simply by assigning a value to a tag (or index) in the list that doesn’t yet exist:

person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)

# has no `age` element
person$age  # NULL

# assign a value to the `age` tag to add it
person$age <- 40
person$age  # [1] 40

# assign using index
person[[10]] <- "Tenth field"
# elements 6-9 will be NULL

You can also remove elements by assiging the special value NULL to their tag or index:

a_list <- list('A', 201, True)
a_list[[2]] <- NULL  # remove element #2
            # [[1]]
            # [1] "A"
            # [[2]]
            # [1] TRUE

8.2 The lapply() Function

Since everything is a vector in R, and most functions are vectorized, you can can pass most functions (e.g., paste(), round(), etc.) a vector as an argument and the function will be applied to each item in the vector. It “just works”. But if you want to apply a function to each item in a list, you need to put in a bit more effort.

In particular, you need to use a function called lapply() (for list apply). This function takes two arguments: the first is a list or vector you want to modify, and the second is a function you want to “apply” to each item in that list. For example:

# list, not a vector
people <- list("Sarah", "Amit", "Zhang")

# apply the `toupper()` function to each element in `people`
people_upper <- lapply(people, toupper)
            # [[1]]
            # [1] "SARAH"
            # [[2]]
            # [1] "AMIT"
            # [[3]]
            # [1] "ZHANG"

# apply the `paste()` function to each element in `people`,
# with an addition argument `"dances!"` to each call
dance_party <- lapply(people, paste, "dances!")
  • Notice that the second argument to lapply() is just the name of the function: not a character string (it’s not in ""). You’re also not actually calling that function (there are no () after it). Just put the name of the function! After that, you can put any additional arguments you want the applied function to be called with: for example, how many digits to round to, or what value to paste to the end of a string.

Note that the lapply() function returns a new list; the original one is unmodified (though if the list contains vectors or other lists as elements, it’s possible for those values to be changed). This makes it a mapping operation (applied to the maps that are lists).

You commonly use lapply() with your own custom functions which define what you want to do to a single element in that list:

# A function that prepends "Hello" to any item
greet <- function(item) {
  return(paste("Hello", item))

# a list of people
people <- list("Sarah", "Amit", "Zhang")

# greet each name
greetings <- lapply(people, greet)
            # [[1]]
            # [1] "Hello Sarah"
            # [[2]]
            # [1] "Hello Amit"

            # [[3]]
            # [1] "Hello Zhang"

Additionally, lapply() is a member of the “*apply()” family of functions: a set of functions that each start with a different letter and applies to a different data structure, but otherwise all work basically the same. For example, lapply() is used for lists, while sapply() (simplified apply) works well for vectors.