Chapter 8 Lists
This chapter covers an additional R data type called lists. Lists are somewhat similar to atomic vectors (they are “generalized vectors”!), but can store more types of data and more details about that data (with some cost). Lists are another way to create R’s version of a Map data structure, a common and extremely useful way of organizing data in a computer program. Moreover: lists are used to create data frames, which is the primary data storage type used for working with sets of real data in R. This chapter will cover how to create and access elements in a list, as well as how to apply functions to lists or vectors.
8.1 What is a List?
A List is a lot like an atomic vector. It is also a one-dimensional positional ordered collection of data. Exaclyt as in case of atomic vectors, list elements preserve their order, and they have a well-defined position in the list. However, lists have a few major differences from vectors:
Unlike a vector, you can store elements of different types in a list: e.g., a list can contain numeric data and character string data, functions, and even other lists.
Because lists can contain any type of data, they are much less efficient as vectors. The vectorized operations that can handle atomic vectors on the fly usually fail in case of lists, or may work substantially slower. Hence one should prefer atomic vectors over lists if possible.
Elements in a list can also be named, but unlike in case of vector, there exists a convenient shorthand
$
-construct to extract named elements from lists.
Lists are extremely useful for organizing data. They allow you to group together data like a person’s name (characters), job title (characters), salary (number), and whether they are in a union (logical)—and you don’t have to remember whether the person’s name or title was the first element! In this sense lists can be used as a quick alternative to formal classes, objects that can store heterogeneous data in a consistent way. This is one of the primary uses of lists.
8.2 Creating Lists
You create a list by using the list()
function and passing it any number of arguments (separated by commas) that you want to make up that list—similar to the c()
functon for vectors.
However, if your list contains heterogenous elements, it is usually a
good idea to specify the names (or tags) for each element in the list in the same
way you can give names to vector elements in c()
—by putting the name tag (which is like a variable name), followed by an equal symbol (=
), followed by the value you want to go in the list and be associated with that tag. For example:
person <- list(first_name = "Ada", job = "Programmer", salary = 78000,
in_union = TRUE)
person
## person
## $first_name
## [1] "Ada"
##
## $job
## [1] "Programmer"
##
## $salary
## [1] 78000
##
## $in_union
## [1] TRUE
This creates a list of 4 elements: "Ada"
which is tagged with
first_name
, "Programmer"
which is tagged with job
, 78000
which
is tagged with salary
, and TRUE
which is tagged with in_union
.
The output lists all component names following the dollar sign $
(more
about it below), and prints the components themselves right after the names.
Note that you can have vectors as elements of a list. In fact, each of these scalar values are really vectors (of length 1) as indicated by
[1]
preceeding their values!The use of the
=
symbol here is an example of assigning a value to a specific named argument. You can actually use this syntax for any function (e.g., rather than listing arguments in order, you can explicit “assign” a value to each argument), but it is more common to just use the normal order of the arguments if there aren’t very many.
Note that if you need to, you can get a vector of element tags using the names()
function:
person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)
names(person) # [1] "first_name" "job" "salary" "in_union"
This is useful for understanding the structure of variables that may have come from other data sources.
It is possible to create a list without tagging the elements, and assign names later if you wish:
person_alt <- list("Ada", 78000, TRUE)
person_alt
## [[1]]
## [1] "Ada"
##
## [[3]]
## [1] 78000
##
## [[4]]
## [1] TRUE
names(person_alt) <- c("name", "income", "memebership")
person_alt
## $name
## [1] "Ada"
##
## $income
## [1] 78000
##
## $memebership
## [1] TRUE
Note that the name tags are missing before we assign names, instead of
names we see the position of the components in double brackets like
[[1]]
(more about it below).
Making name-less lists and assigning names later is usually more error-prone and harder way to make lists manually, but when you automatically create lists in your code, it may be the only option.
Finally, empty lists of given length can also be created using the
general vector()
function. For instance, vector("list", 5)
, creates
a list of five NULL
elements. This is a good approach if you just want am empty list to be filled in a loop later.
8.3 Accessing List Elements
There are four ways to access elements in lists. Three of these reflect
atomic vector indexing, the $
-construct is unique for lists. However, there
are important differences.
8.3.1 Indexing by position
You can always access list elements by their position. It is in many ways similar to that of atomic vectors with one major caveat: indexing with single brackets will extract not the components but a sublist that contains just those components:
# note: this list is not not an atomic vector, even though elements have the same types
animals <- list("Aardvark", "Baboon", "Camel")
animals[c(1,3)]
## [[1]]
## [1] "Aardvark"
##
## [[2]]
## [1] "Camel"
You can see that the result is a list with two components, “Aardvark” and “Camel”, picked from the the positions 1 and 3 in the original list.
The fact that single brackets return a list in case of vector is actually a smart design choice. First, it cannot return a vector in general—the requested components may be of different type and simply not fit into an atomic vector. Second, single-bracket indexing in case of vectors actually returns a subvector. We just tend to overlook that a “scalar” is actually a length-1 vector. But however smart this design decision may be, people tend to learn it in the hard way. When confronted with weird errors, check that what you think should be a vector is in fact a vector and not a list.
The good news is that there is an easy way to extract components. A single element, and not just a length-one-sublist, is extracted by double brackets. For instance,
returns a length-1 character vector.
Unfortunately, the good news end here. You can extract individual
elements in this way, but you cannot get a vector of individual list
components: animals[[1:2]]
will give you subscript out of bounds.
As above, this is a design choice: as list components may be of
different type, you may not be able to mold these into a single vector.
There are ways to merge components into a vector, given they are of the
same type. For instance Reduce(c, animals)
will convert the animals
into a vector of suitable type. Ditto with as.character(animals)
.
8.3.2 Indexing by Name
If the list is named, one can use a character vector to extract it’s components, exacly in the same way as we used the numeric positions above. For instance
person <- list(first_name = "Bob", last_name = "Wong", salary = 77000, in_union = TRUE)
person[c("first_name", "salary")]
## $first_name
## [1] "Bob"
##
## $salary
## [1] 77000
person[["first_name"]] # [1] "Bob"
person[["salary"]] # [1] 77000
As in case of positional indexing, single brackets return a sublist while double brackets return the corresponding component itself.
8.3.3 Indexing by Logical Vector
As in case of atomic vectors, we can use logical indices with lists too. There are a few differences though:
- one can only extract sublists, not individual components.
person[c(TRUE, TRUE, FALSE, FALSE)]
will give you a sublist with first and last name.person[[c(TRUE, FALSE, FALSE, FALSE)]]
will fail. - many operators are vectorized but they are not “listified”. You
cannot do math like
*
or+
with lists. Hence the powerful logical indexing operations likex[x > 0]
are in general not possible with lists. This substantially reduces the potential usage cases of logical indexing.
For instance, we can extract all components of certain name from the list:
planes <- list("Airbus 380"=c(seats=575, speed=0.85),
"Boeing 787"=c(seats=290, speed=0.85),
"Airbus 350"=c(seats=325, speed=0.85))
# cruise speed, Mach
planes[startsWith(names(planes), "Airbus")] # extract components, names
# of which starting with "Airbus"
## $`Airbus 380`
## seats speed
## 575.00 0.85
##
## $`Airbus 350`
## seats speed
## 325.00 0.85
However, certain vectorized operations, such as >
or ==
also work with lists
that contain single numeric values as their elements. It seems to be
hard to come up with general rules, so we recommed not to rely on this
behaviour in code.
8.3.4 Extracting named elements with $
Finally, there is a very convenient $
-shortcut alternative for
extracting individual components.
If you printed out one of the named lists above, for instance person
, you would see the following:
person <- list(name = "Ada", job = "Programmer")
print(person)
## $first_name
## [1] "Ada"
##
## $job
## [1] "Programmer"
Notice that the output lists each name tag prepended with a dollar sign
($
) symbol, and then on the following line the vector that is the
element itself. You can retrieve individual components in a similar
fashion, the dollar notation is one of the easiest ways of accessing
list elements. You refer to the particular element in the list with its
tag by writing the name of the list, followed by a $
, followed by the element’s tag:
Obviously, this only works for named lists. There are no dollar notation analogue for atomic vectors, even for named
vectors. $
extractor only exists for lists (and such data structures
that are derived from lists, like data frames).
You can almost read the dollar sign as like an “apostrophe s” (possessive) in English: so person$salary
would mean “the person
list’s salary
value”.
Dollar notation allows list elements to almost be treated as variables in their own right—for example, you specify that you’re talking about the salary
variable in the person
list, rather than the salary
variable in some other list (or not in a list at all).
person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)
# use elements as function or operation arguments
paste(person$job, person$first_name) # [1] "Programmer Ada"
# assign values to list element
person$job <- "Senior Programmer" # a promotion!
print(person$job) # [1] "Senior Programmer"
# assign value to list element from itself
person$salary <- person$salary * 1.15 # a 15% raise!
print(person$salary) # [1] 89700
Dollar-notation is a drop-in replacement to double-brackets extraction given you know the name of the component. If you do not—as is often the case when programming—you have to rely on double bracket approach.
8.3.5 Single vs. Double Brackets vs. Dollar
The list indexing may be confusing: we have single and double brackets, indexing by position and name, and finally the dollar-notation. Which is the right thing to do? As is so often the case, it depends.
Dollar notation is the quickest and easiest way to extract a single named component in case you know it’s name.
Double brackets is very much a more verbose alternative to the dollar notation. It returns a single component exactly as the dollar notation. However, it also allows one to decide later which components to extract. (This is terribly useful in programs!) For instance, we can decide if we want to use someones first or last name:
person <- list(first_name = "Bob", last_name = "Wong", salary = 77000)
name_to_use <- "last_name" # choose name (i.e., based on formality)
person[[name_to_use]] # [1] "Wong"
name_to_use <- "first_name" # change name to use
person[[name_to_use]] # [1] "Bob"
Note: you can often hear that double brackets return a vector. This is only true if the corresponding element is a vector. But they always return the element!
- Single brackets is the most powerful and universal way of indexing. If work in a very similar fashion than vector indexing. The main caveat here is that it returns a sub-list, not a vector. (But note that in case of vectors, single-bracket indexing returns a sub-vector.) It allows by position, by names, and by logical vector.
In some sense it is filtering by whatever vector is inside the brackets (which may have just a single element). In R, single brackets always mean to filter the collection where the collection may be either atomic vector or list. So if you put single-brackets after a collection, you get a filtered version of the same collection, containing the desired elements. The type of the collection, list or atomic vector, is not affected.
Watch out: In vectors, single-bracket notation returns a vector, in lists single-bracket notation returns a list!
We recap this section by an example:
animal <- list(class='A', count=201, endangered=TRUE, species='rhinoceros')
## SINGLE brackets returns a list
animal[1]
## $class
## [1] "A"
## can use any vector as the argument to single brackets, just like with vectors
animal[c("species", "endangered")]
## $species
## [1] "rhinoceros"
##
## $endangered
## [1] TRUE
## DOUBLE brackets returns the element (here its a vector)!
animal[[1]] # [1] "A"
## Dollar notation is equivalent to the double brackets
animal$class # [1] "A"
Finally, all these methods can also be used for assignment. Just put any of these construct on the left side of the assignment operator <-
.
8.4 Modifying Lists
As in the case with atomic vectors, you can assign new values to existing elements. However, lists also enable dedicated syntax to remove elements. (Remember, you can always “unselect” an element in a vector, including list, by using negative positional index.)
You can add elements to a list simply by assigning a value to a tag (or index) in the list that doesn’t yet exist:
person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)
# has no `age` element
person$age # NULL
# assign a value to the `age` tag to add it
person$age <- 40
person$age # [1] 40
# assign using index
person[[10]] <- "Tenth field"
# elements 6-9 will be NULL
This parallel fairly closely with atomic vectors.
You can remove elements by assiging the special value NULL
to their tag or index:
a_list <- list('A', 201, True)
a_list[[2]] <- NULL # remove element #2
print(a_list)
# [[1]]
# [1] "A"
#
# [[2]]
# [1] TRUE
There is no analogue here to atomic vectors.
8.5 The lapply()
Function
A large number of common R functions (e.g., paste()
, round()
, etc.) and most common operators (like +
, >
, etc) are vectorized so you can pass vectors as arguments, and the function will be applied to each item in the vector. It “just works”. In case of lists it usually fails. You need to put in a bit more effort if you want to apply a
function to each item in a list.
The effort involves either an explicit loop, or an implicit loop through a function called lapply()
(for list apply). We will discuss the latter approach here.
lapply()
takes two arguments: the first is a list (or a vector, vectors will do as well) you want to work with, and the second is the function you want to “apply” to each item in that list. For example:
# list, not a vector
people <- list("Sarah", "Amit", "Zhang")
# apply the `toupper()` function to each element in `people`
lapply(people, toupper)
## [[1]]
## [1] "SARAH"
##
## [[2]]
## [1] "AMIT"
##
## [[3]]
## [1] "ZHANG"
You can add even more arguments to lapply()
, those will be assumed to belong to the function you are applying:
# apply the `paste()` function to each element in `people`,
# with an addition argument `"dances!"` to each call
lapply(people, paste, "dances!")
## [[1]]
## [1] "Sarah dances!"
##
## [[2]]
## [1] "Amit dances!"
##
## [[3]]
## [1] "Zhang dances!"
The last unnamed argument, "dances"
, are taken as the second argument to paste
. So behind the scenes, lapply()
runs a loop over paste("Sarah", "dances!")
, paste("Amit", "dances!")
and so on.
- Notice that the second argument to
lapply()
is just the function: not the name of the functions as character string (it’s not quoted in""
). You’re also not actually calling that function when you write it’s name inlapply()
(you don’t put the parenthesis()
after its name). See more in section How to Use Functions. After the function, you can put any additional arguments you want the applied function to be called with: for example, how many digits to round to, or what value to paste to the end of a string.
Note that the lapply()
function returns a new list; the original one is unmodified. This makes it a mapping operation. It is an operation, and not the same thing as map data structure. In mapping operation the code applies the same elemental function to the all elements in a list.
You commonly use lapply()
with your own custom functions which define what you want to do to a single element in that list:
# A function that prepends "Hello" to any item
greet <- function(item) {
return(paste("Hello", item))
}
# a list of people
people <- list("Sarah", "Amit", "Zhang")
# greet each name
greetings <- lapply(people, greet)
## [[1]]
## [1] "Hello Sarah"
##
## [[2]]
## [1] "Hello Amit"
##
## [[3]]
## [1] "Hello Zhang"
Additionally, lapply()
is a member of the “*apply()
” family of functions: a set of functions that each start with different letters and may apply to a different data structure, but otherwise all work in a similar fashion. For example, lapply()
returns a list, while sapply()
(simplified apply) simplifies the list into a vector, if possible. If you are interested in parallel programming, we recommend you to check out the function parLapply
and it’s friends in the parallel package.