D Using Remote Server
Sooner-or-later you are in a situation where you have to work on a distant networked computer. There are many reasons for this, either your laptop is to weak for certain tasks, or certain data is not allowed to be taken out from where it is, or you are expected to use the same computer as your teammates. The server may be a standalone box located in a rack in your employer’s server room, or it may be a virtual machine in a cloud like Amazon EC2. You may also want to set up your own server.
D.1 Server Setup
There are many ways one can set up a distant machine. It may be Windows or linux (or any of the other unixes). It may or may not have graphical user interface (GUI) installed or otherwise accessible (many unix programs can display nice windows on your laptop while still running on the server). It may or may not have RStudio made available over web browser. Here we discuss the most barebone setup with no access to GUI and no web access to RStudio.
This is a fairly common setup, for instance when dealing with sensitive data, in organizations where computer skills and sysadmin’s time is limited, or when you rent your own tiny but cheap server.
D.2 Connecting to the Remote Server
Given the server is already running, your first task is to connect to it. Here it means that you will enter commands on your laptop, but those command are actually run on the server.
The most common way to connect to remote server is via ssh. ssh stands for “secure shell” and means that both all network communication is encrypted. You connect to the server as
The remote server asks for your password and opens remote shell connection. Note: when entering your password, it usually does not print anything in response, not even asterisks. It feels as if your keyboard is not working. It will offer you a similar bash shell environment as you are using on your computer but most likely you see a different prompt, one that contains the server’s name. You may also see some login messages. Now all the commands you are issuing are running on the remote machine. So
pwd shows your working directory on the server, which in general is not the same as on the local machine, and
ls shows the files on the server, not on your laptop. Now you can use
mkdir to create the project folder on the server.
By default, ssh attempts to login with your local username. If your username on the server differs from that on your laptop, you want to add it to the ssh command:
Finally, when done, you want to get out. The polite way to close the connection is with command
that waits until all open connections are safely closed. But usually you can as well just close the terminal.
D.3 Copying Files
Before you can run your R scripts on the server, you have to get these copied over. There are several possibilities.
The most straightforward approach is
scp, secure copy. It works in many ways in the same way as
cp for the local files, just
scp can copy files between your machine and a remote computer. Under the hood it uses ssh connection, just like
ssh command itself. It syntax is rather similar to that of
scp user1@host1:file1 user2@host2:file2
This copies “file1” from the server “host1” under username “user1” to the other server. Passwords are asked for as needed. The “host” part of the file must be understood as the full hostname including dots, such as “hyak.washington.edu”. “file” is the full path to file, relative to home directory, such as
Desktop/info201/myscript.R. When accessing local files, you may omit the “user@host:” part. So, for instance, in order to copy your
myscript.R from folder
info201 on your laptop’s Desktop to the folder
scripts in your home folder on the server, you may issue
scp Desktop/info201/myscript.R firstname.lastname@example.org:scripts/
(here we assume that the working directory of your laptop is the one above
Desktop.) Note that exactly as with
cp, you may omit the destination file name if the destination is a directory: it simply copies the file into that directory while preserving its name.
After running your script, you may want to copy your results back to your laptop. For instance, if you need to get the file
figure.png out of the server, you can do
scp email@example.com:scripts/figure.png Desktop/info201/
As above, this copies a file from the given directory, and drops it into the
info201 folder on your Desktop.
rsync is a more advanced approach to
scp. It works in many ways like
scp, just it is smart enough to understand which files are updated, and copy the updated parts of the files only. It is the recommended way for working with small updates in large files. Its syntax is rather similar to that of
scp. To copy
file to the remote server as
file2 (in the home directory), we do
rsync file user2@host2:file2
and in order to copy a
file1 from server as local
file (in the current working directory):
rsync file user1@host1:file1 file
I also recommend to explore some of its many options, for instance
-v (verbose) reports what it’s doing. The example above with your code and figure might now look like that:
rsync -v Desktop/info201/myscript.R firstname.lastname@example.org:scripts/ # now run the script on the remote machine rsync -v email@example.com:scripts/figure.pdf Desktop/info201/
Maybe the easiest way to copy your files is to copy (or rather update) the whole directories. For instance, instead of the code above, you can do
# copy all files to server: rsync -v Desktop/info201/* firstname.lastname@example.org:scripts/ # now run the script on the remote machine # ... and copy the results back: rsync -v email@example.com:scripts/* Desktop/info201/
* means all files in this directory. Hence, instead of copying the files individually between the computers, we just copy all of them. Even better, we actually do not copy but just update. Huge files that do not change do not take any bandwidth.
D.3.3 Graphical Frontends
Instead on relying on command line tools, one can also use graphical front-ends. For instance, “WinSCP” is a nice Norton Commander-Style frontend for copying files between the local and a remote machine over scp for Windows. It provides a split window representing files on the local and the remote end, and one can move, copy-and-paste and interact with the mouse on these panes. On Mac you may take a look at “Cyberduck”.
D.3.4 Remote Editing
Besides copying your files, many text editors also offer a “remote editing” option. From the user perspective this looks as if directly working on the remote server’s hard disk. Under the hood, the files are copied back and forth with scp, rsync or one of their friends. Emacs and vi do it out-of-the box, VSCode, Atom and sublime require a plugin. AFAIK it is not possible with RStudio.
It is also possible to mount (attach) the harddisk of the remote server to your laptop as if it were a local disk. Look yourself for more information if you are interested.
D.4 R and Rscript
When your code has been transferred to the server, your next task is to run it. But before you can do it, you may want to install the packages you need. For instance, you may want to install the whole “tidyverse” bundle. This must be done from R console using
install.packages(). You start R interactively by the command
It opens an R session, not unlike what you see inside of RStudio, just here you have no RStudio to handrail you through the session. Now all loading, saving, inspecting files, etc must be done through R commands.
The first time you do it, R complains about non-writeable system-wide library and proposes to install and create your personal libary. You should answer “yes” to these prompts. As Linux systems typically compile the packages during installations, installation is slow and you see many messages (including warnings) in the process. But it works, given that the necessary system libraries are available.
Now you can finally run your R code. I strongly recommend to do it from the directory where you intend to run the project before starting R (
cd scripts if you follow the example directory setup above). There are two options: either start R interactively, or run it as a script. If you do it from an interactive R session, you have to source your script:
The script will run, and the first attempt most likely ends with an error message. You have to correct the error either on your laptop and copy the file over to the server again, or directly on the server, and re-run it again. Note that you don’t have to exit from the R session when copying the files between your laptop and the server. Edit it, copy it over from your laptop (using
scp or other tools), and just re-source the file from within the R session.
Opening a separate R session may be useful for installing packages. For running your scripts, I recommend you to run it entirely from command line, either as
R CMD BATCH myscript.R
The first version produces a little more informative error messages, the other one handles the environment in a little more consistent and efficient manner.
D.4.1 Graphics Output with No GUI
If the server does not have any graphics capabilities, you have to save your figures as files. For instance, to save the image in a pdf file, you may use the following code in your R program:
pdf(file="figure1.pdf", width=12, height=8) # width and height in inches # check also out jpeg() and png() devices. # do your plotting here plot(1:10, rnorm(10)) # done plotting dev.off() # saves the image to disk and closes the file.
Afterwards you will have to copy the image file to your laptop for inspection, or addition in your final project report.
D.5 Life on Server
The servers operate the same in many ways as the command line on your own computer. However, there are a number of differences.
D.5.2 Useful Things to Do
There are several useful commands you can experiment with while on the server.
q to quit) tells you which programs run on the server, how much memory and cpu do these take, and who are their owners (the corresponding users). It also permits you to kill your misbehaving processes (press
k and select
SIGKILL). Read more with
(who) prints the current logged-in users of the server.
(display free in human-readable units) shows the free and occupied disk space. You are mainly influenced by what is going on in the file system
D.5.3 Permissions and ownership
Unix systems are very strict about ownership and permissions. You are a normal user with limited privileges. In particular, you cannot modify or delete files that you don’t own. In a similar fashion, you cannot kill processes you did not start. Feel free to attempt. It won’t work.
In case you need to do something with elevated privileges (as “superuser”), you have to contact the system administrator. In practice, their responsiveness and willingness to accommodate your requests will vary.
D.5.4 More than One Connection
It perfectly possible to log onto the server through multiple terminals at the same time. You just open several terminals and log onto the server from each of these. You can use one terminal to observe how your script is doing (with
htop), the other one to run the script, and the third one to inspect output. If you find such approach useful, I recommend you to familiarize yourself with gnu screen (command
screen that includes many related goodies.)
D.6 Advanced Usage
D.6.1 ssh keys, .ssh/config
Without further configuration, every time you open a ssh connection, you have to type your password. Instead of re-entering it over and over again (this may not be particularly secure), you can configure your ssh keys and copy it to the server. Next time, you will be automatically authenticated with the key and you don’t have to type the password any more. Note: this is the same ssh key that is used by GitHub if you use ssh connection to GitHub.
As the first step, you have to create the key with
ssh-keygen (you may choose an empty passphrase) unless you already have created one. Thereafter copy your public key to the server with
ssh-copy-id. Next time you log onto the server, no password is needed. A good source for help with creating and managing ssh keys is GitHub help.
You can also configure your ssh to recognize abbreviated server names and your corresponding user names. This allows you to connect to server with a simple command like
ssh info201. This information is stored in the file
~/.ssh/config, and should contain lines like
Host info201 User <your username> Hostname info201.ischool.uw.edu
Host keyword is followed by the abbreviated name of the server, the following lines contain your username and the publicly visible hostname for the server. Seek out more information if you are interested.
D.6.2 Running RScript in ssh Session
Passwordless ssh connection gives you new wonderful possibilities. First, you don’t even have to log into the server explicitly. You can run a one-command ssh session on the server directly from your laptop. Namely, ssh accepts commands to be run on the remote machine. If invoked as
ssh firstname.lastname@example.org "Rscript myscript.R"
It does not open a remote shell but runs
Rscript script.R instead. Your command sequence for the whole process will accordingly look something like:
rsync -v Desktop/info201/* email@example.com:scripts/ ssh firstname.lastname@example.org "Rscript scripts/myscript.R" rsync -v email@example.com:scripts/* Desktop/info201/
All these command are issued on your laptop. You can also save these to a text file and run all three together as a single shell script!
Further, you don’t even need the shell. Instead, you may explain R on your laptop how to start R on the remote server over ssh. In this way you can turn your laptop and server combination into a high-performance-computing cluster! This allows you to copy the script and run it on the server directly from within your R program that runs on your laptop. Cluster computing is out of scope of this chapter, but if you are interested, look up the makePSOCKcluster() function in parallel package.