Chapter 12 R Markdown

R Markdown is a package that supports using R to dynamically create documents, such as websites (.html files), reports (.pdf files), slideshows (using ioslides or slidy), and even interactive web apps (using shiny).

As you may have guessed, R Markdown does this by providing the ability to blend Markdown syntax and R code so that, when executed, scripts will automatically inject your code results into a formatted document. The ability to automatically generate reports and documents from a computer script eliminates the need to manually update the results of a data analysis project, enabling you to more effectively share the information that you’ve produced from your data. In this chapter, you’ll learn the fundamentals of the RMarkdown library to create well-formatted documents that combine analysis and reporting.

12.1 R Markdown and RStudio

R Markdown documents are created from a combination of two libraries: rmarkdown (which process the markdown and generates the output) and knitr (which runs R code and produces Markdown-like output). These packages are already included in RStudio, which provides built-in support for creating and viewing R Markdown documents.

12.1.1 Creating `.Rmd` Files

The easiest way to begin a new R-Markdown document in RStudio is to use the File > New File > R Markdown menu option:

Create a new R Markdown document in RStudio.

RStudio will then prompt you to provide some additional details abour what kind of R Markdown document you want. In particular, you will need to choose a default document type and output format. You can also provide a title and author information which will be included in the document. This chapter will focus on creating HTML documents (websites; the default format)—other formats require the installation of additional software.

Specify document type.

Once you’ve chosen R Markdown as your desired file type, you’ll be prompted to choose a default document type and output format that you would like to create. In this module, we’ll discuss creating HTML documents (websites).

Once you’ve chosen your desired document type and output format, RStudio will open up a new script file for you. The file contains some example code for you.

12.1.2 `.Rmd` Content

At the top of the file is some text that has the format:

---
title: "Example"
author: "YOUR NAME HERE"
date: "1/30/2017"
output: html_document
---

This is the document “header” information, which tells R Markdown details about the file and how the file should be processed. For example, the title, author, and date will automatically be added to the top of your document. You can include additional information as well, such as whether there should be a table of contents or even variable defaults.

The header is written in YAML format, which is yet another way of formatting structured data similar to .csv or JSON (in fact, YAML is a superset of JSON and can represent the same data structure, just using indentation and dashes instead of braces and commas).

Below the header, you will find two types of content:

Markdown: normal Markdown text like you learned in Chapter 3. For example, you can use two pound symbols (##) for a second-level heading.
Code Chunks: These are segments (chunks) of R code that look like normal code block elements (using ```), but with an extra {r} immediately after the opening backticks.

R Markdown will be able to execute the R code you include in code chunks, and render that output in your Markdown. More on this below.

Important This file should be saved with the extension .Rmd (for “R Markdown”), which tells the computer and RStudio that the document contains Markdown content with embedded R code.

12.1.3 Knitting Documents

RStudio provides an easy interface to compile your .Rmd source code into an actual document (a process called “knitting”). Simply click the Knit button at the top of the script panel:

RStudio’s Knit button

This will generate the document (in the same directory as your .Rmd file), as well as open up a preview window in RStudio.

While it is easy to generate such documents, the knitting process can make it hard to debug errors in your R code (whether syntax or logical), in part because the output may or may not show up in the document! We suggest that you write complex R code in another script and then source() that script into your .Rmd file for use the the output. This makes it possible to test your data processing work outside of the knit application, as well as separates the concerns of the data and its representation—which is good programming practice.

Nevertheless, you should still be sure and knit your document frequently, paying close attention to any errors that appear in the console.

Pro-tip: If you’re having trouble finding your error, a good strategy is to systematically remove segments of your code and attempt to re-knit the document. This will help you identify the problematic syntax.

12.1.4 HTML

Assuming that you’ve chosen HTML as your desired output type, RStudio will knit your .Rmd into a .html file. HTML stands for HyperText Markup Language and, like Markdown, is a syntax for describing the structure and formatting of content (though HTML is far more extensive and detailed). In particular, HTML is a markup language that can be automatically rendered by web browsers, and thus is the language used to create web pages. As such, the .html files you create can be put online as web pages for others to view—you will learn how to do this in a future chapter. For now, you can open a .html file in any browser (such as by double-clicking on the file) to see the content outside of RStudio!

As it turns out, it’s quite simple to use GitHub to host publicly available webpages (like the .html files you create with RMarkdown). But, this will require learning a bit more about git and GitHub. For instructions on publishing your .html files as web-pages, see chapter 14.

12.2 R Markdown Syntax

What makes R Markdown distinct from simple Markdown code is the ability to actually execute your R code and include the output directly in the document. R code can be executed and included in the document in blocks of code, or even inline in the document!

12.2.1 R Code Chunks

Code that is to be executed (rather than simply displayed as formatted text) is called a code chunk. To specify a code chunk, you need to include {r} immediately after the backticks that start the code block (the ```). For example:

Write normal **markdown** out here, then create a code block:

```{r}
# Execute R code in here
x <- 201
```

Back to writing _markdown_ out here.

Note that by default, the code chunk will render any raw expressions (e.g., x)—just like you would see in the console if you selected all the code in the chunk and used ctrl-enter to execute it.

It is also possible to specify additional configuration options by including a comma-separate list of named arguments (like you’ve done with lists and functions) inside the curly braces following the r:

```{r options_example, echo=FALSE, message=TRUE}
# a code chunk named "options_example", with parameter `echo` assigned FALSE
# and parameter `message` assigned TRUE

# Would execute R code in here
```

The first “argument” (options_example) is a “name” for the chunk, and the following are named arguments for the options. Chunks should be named as a variable or function, based on what code is being executed and/or rendered by the chunk. It’s always a good idea to name individual code chunks as a form of documentation.

There are many options for creating code chunks (see also the reference). However some of the most useful ones have to do with how the code is outputted in the the document. These include:

echo indicates whether you want the R code itself to be displayed in the document (e.g., if you want readers to be able to see your work and reproduce your calculations and analysis). Value is either TRUE (do display; the default) or FALSE (do not display).
message indicates whether you want any messages generated by the code to be displayed. This includes print statements! Value is either TRUE (do display; the default) or FALSE (do not display).

If you only want to show your R code (and not evaluate it), you can alternatively use a standard Markdown codeblock that indicates the r language (```r, not ```{r}), or set the eval option to FALSE.

12.2.2 Inline Code

In addition to creating distinct code blocks, you may want to execute R code inline with the rest of your text. This empowers you to reference a variable from your code-chunk in a section of Markdown—injected that variable into the text you have written. This allows you to easily include a specific result inside a paragraph of text. So if the computation changes, re-knitting your document will update the values inside the text without any further work needed.

As with code blocks, you’ll follow the Markdown convention of using single backticks (`), but put the letter r immediately after the first backtick. For example:

To calculate 3 + 4 inside some text, we can use `r 3 + 4` right in the _middle_.

When you knit the text above, the `r 3 + 4` would be replaced with the number 7.

Note you can also reference values computed in the code blocks preceding your inline code; it is best practice to do your calculations in a code block (with echo=FALSE), save the result in a variable, and then simply inline that variable with e.g., `r my.variable`.

12.3 Rendering Data

R Markdown’s code chunks let you perform data analysis directly in your document, but often you will want to include more complex data output. This section discusses a few tips for specifying dynamic, complex output to render using R Markdown.

12.3.1 Rendering Strings

If you experiment with knitting R Markdown, you will quickly notice that using print() will generate a code block with content that looks like a printed vector:

```{r echo=FALSE}
print("Hello world")
```

## [1] "Hello world"

For this reason, you usually want to have the code block generate a string that you save in a variable, which you can then display with an inline expression (e.g., on its own line):

```{r echo=FALSE}
msg <- "Hello world"
```

Below is the message to see:
`r msg`

Note that any Markdown syntax included in the variable (e.g., if you had msg <- "**Hello** world") will be rendered as well—the `r msg `is replaced by the value of the expression just as if you had typed that Markdown in directly. This allows you to even include dynamic styling if you construct a “Markdown string” out of your data.

Alternatively, you can use as results option of 'asis', which will cause the “output” to be rendered directly into the markdown. When combined with the cat() function (which concatenates content without specifying additional information like vector position), you can make a code chunk effectively render a specific string:

```{r results='asis', echo=FALSE}
cat("Hello world")
```

12.3.2 Rendering Lists

Because outputted strings render any Markdown they contain, it’s possible to specify complex Markdown such as lists by constructing these strings to contain the - symbols utilized (note that each item will need to be separated by a line break or a \n character):

```{r echo=FALSE}
markdown.list <- "
- Lions
- Tigers
- Bears
"
```

`r markdown.list`

Would output a list that looks like:

Lions
Tigers
Bears

Combined with the vectorized paste() function, it’s to easily convert vectors into Markdown lists that can be rendered

```{r echo=FALSE}
animals <- c("Lions", "Tigers", "Bears")

# paste a `-` in front of each, then cat the items with newlines between
markdown.list <- paste(paste('-',animals), collapse='\n')
```

`r markdown.list`

And of course, the contents of the vector (e.g., the text "Lions") could easily have additional Markdown syntax syntax to include bold, italic, or hyperlinked text.

Creating a “helper function” to do this conversion is perfectly reasonable; or see libraries such as pander which defines a number of such functions.

12.3.3 Rendering Tables

Because data frames are so central to programming with R, R Markdown includes capabilities to easily render data frames as Markdown tables via the knitr::kable() function. This function takes as an argument the data frame you wish to render, and it will automatically convert that value into a Markdown table:

```{r echo=FALSE}
library(knitr)  # make sure you load this library (once per doc)

# make a data frame
letters <- c("a", "b", "c")
numbers <- 1:3
df <- data.frame(letters = letters, numbers = numbers)

# render the table
kable(df)
```

kable() supports a number of other arguments that can be used to customize how it outputs a table.
And of courrse, if the values in the dataframe are strings that contain Markdown syntax (e.g., bold, itaic, or hyperlinks), they will be rendered as such in the table!

So while you may need to do a little bit of work to manually generate the Markdown syntax, it is possible to dynamically produce complex documents based on dynamic data sources

Technical Foundations of Informatics

Chapter 12 R Markdown

12.1 R Markdown and RStudio

12.1.1 Creating `.Rmd` Files

12.1.2 `.Rmd` Content

12.1.3 Knitting Documents

12.1.4 HTML

12.2 R Markdown Syntax

12.2.1 R Code Chunks

12.2.2 Inline Code

12.3 Rendering Data

12.3.1 Rendering Strings

12.3.2 Rendering Lists

12.3.3 Rendering Tables

Resources

Chapter 12 R Markdown

12.1 R Markdown and RStudio

12.1.1 Creating .Rmd Files

12.1.2 .Rmd Content

12.1.3 Knitting Documents

12.1.4 HTML

12.2 R Markdown Syntax

12.2.1 R Code Chunks

12.2.2 Inline Code

12.3 Rendering Data

12.3.1 Rendering Strings

12.3.2 Rendering Lists

12.3.3 Rendering Tables

Resources

12.1.1 Creating `.Rmd` Files

12.1.2 `.Rmd` Content