A Using Remote Server

Sooner-or-later you are in a situation where you have to work on a distant networked computer. There are many reasons for this, either your laptop is to weak for certain tasks, or certain data is not allowed to be taken out from where it is, or you are expected to use the same computer as your teammates. The server may be a standalone box located in a rack in your employer’s server room, or it may be a virtual machine in a cloud like Amazon EC2. You may also want to set up your own server.

A.1 Server Setup

There are many ways one can set up a distant machine. It may be windows or linux (or any of the other unixes). It may or may not have graphical user interface (GUI) installed or otherwise accessible (many unix programs can display nice windows on your laptop while still running on the server). It may or may not have RStudio made available over web browser. Here we discuss the most barebone setup with no access to GUI and no web access to RStudio.

This is a fairly common setup, for instance when dealing with sensitive data, with in organizations where computer skills and sysadmin’s time is limited, or when you rent your own tiny but cheap server.

A.2 Connecting to the Remote Server

Given the server is already running, your first task is to connect to it. Here it means that you will enter commands on your laptop, but those command are actually run on the server.

The most common way to connect to remote server is via ssh. ssh stands for “secure shell” and means that both all network communication is encrypted. You connect to the server as

ssh myserver.somewhere.com

The remote server asks for your password and opens remote shell connection. Note: when entering your password, it usually does not print anything in response, not even asterisks. It feels as if your keyboard is not working. It will offer you a similar bash shell environment as you are using on your computer but most likely you see a different prompt, one that contains the server’s name. You may also see some login messages. Now all the commands you are issuing are running on the remote machine. So pwd shows your working directory on the server, which in general is not the same as on the local machine, and ls shows the files on the server, not on your laptop. Now you can use mkdir to create the project folder on the server.

By default, ssh attempts to login with your local username. If your username on the server differs from that on your laptop, you want to add it to the ssh command:

ssh username@myserver.somewhere.com

Finally, when done, you want to get out. The polite way to close the connection is with command

exit

that waits until all open connections are safely closed. But usually you can as well just close the terminal.

A.3 Copying Files

Before you can run your R scripts on the server, you have to get these copied over. There are several possibilities.

A.3.1 scp

The most straightforward approach is scp, secure copy. It works in many ways in the same way as cp for the local files, just scp can copy files between your machine and a remote computer. Under the hood it uses ssh connection, just like ssh command itself. It syntax is rather similar to that of cp:

scp user1@host1:file1 user2@host2:file2

This copies “file1” from the server “host1” under username “user1” to the other server. Passwords are asked as as needed. the “host” part of the file must be understood as the full hostname including dots, such as “hyak.washington.edu”. “file” is the full path to file, relative to home directory, such as Desktop/info201/myscript.R. When accessing local files, you may omit the “user@host:” part. So, for instance, in order to copy your myscript.R from folder info201 on your laptop’s Desktop to the folder scripts in your home folder on the server, you may issue

scp Desktop/info201/myscript.R myusername@server.ischool.edu:scripts/

(here we assume that the working directory of your laptop is the one above Desktop.) Note that exactly as cp, you may omit the destination file name if the destination is a directory: it simply copies the file into that directory while preserving it’s name.

After running your script, you may want to copy your results back to your laptop. For instance, if you need to get the file figure.png out of the server, you can do

scp myusername@server.ischool.edu:scripts/figure.png Desktop/info201/

As above, this copies a file from the given directory, and drops it into the info201 folder on your Desktop.

A.3.2 rsync

rsync is a more advanced approach to scp. It works in many ways like scp, just it is smart enough to understand which files are updated, and copy the updated parts of the files only. It is the recommended way for working with small updates in large files. It’s syntax is rather similar to that of scp. To copy file to the remote server as file2 (in the home directory), we do

rsync file user2@host2:file2

and in order to copy a file1 from server as local file (in the current working directory):

rsync file user1@host1:file1 file

I also recommend to explore some of it’s many options, for instance -v (verbose) reports what it’s doing. The example above with your code and figure might now look like that:

rsync -v Desktop/info201/myscript.R myusername@server.ischool.edu:scripts/
# now run the script on the remote machine
rsync -v myusername@server.ischool.edu:scripts/figure.pdf Desktop/info201/

Maybe the easiest way to copy your files is to copy (or rather update) the whole directories. For instance, instead of the code above, you can do

# copy all files to server:
rsync -v Desktop/info201/* myusername@server.ischool.edu:scripts/
# now run the script on the remote machine
# ... and copy the results back:
rsync -v myusername@server.ischool.edu:scripts/* Desktop/info201/

Here * means all files in this directory. Hence, instead of copying the files individually between the computers, we just copy all of them. Even better, we actually do not copy but just update. Huge files that do not change do not take any bandwidth.

A.3.3 Graphical Frontends

Instead on relying on command line tools, one can also use graphical front-ends. For instance, “WinSCP” is a nice Norton Commander-Style frontend for copying files between the local and a remote machine over scp for Windows. It provides a split window representing files on the local and the remote end, and one can move, copy-and-paste and interact with mouse on these panes. On Mac you may take a look at “Cyberduck”.

A.3.4 Remote Editing

Besides copying your files, many text editors also offer “remote editing” option. From the user perspective this looks as if directly working on the remote server’s hard disk. Under the hood, the files are copied back and forth with scp, rsync or one of their friends. Emacs and vi do it out-of-the box, VSCode, Atom and sublime require a plugin. AFAIK it is not possible with RStudio.

It is also possible to mount (attach) the harddisk of the remote server to your laptop as if it were a local disk. Look yourself for more information if you are interested.

A.4 R and Rscript

When your code has been transferred to the server, your next task is to run it. But before you can do it, you may want to install the packages you need. For instance, you may want to install the whole “tidyverse” bundle. This must be done from R console using install.packages(). You start R interactively by the command

R

It opens an R session, not unlike what you see inside of RStudio, just here you have no RStudio to handrail you through the session. Now all loading, saving, inspecting files, etc must be done through R commands.

The first time you do it, R complains about non-writeable syste-wide library and proposes to install and create your personal libary. You should answer “yes” to these prompts. As linux systems typically compile the packages during installations, installation is slow and you see many messages (including warnings) in the process. But it works, given that the necessary system libraries are available.

Now you can finally run your R code. I strongly recommend to do it from the directory where you intend to run the project before starting R (cd scripts if you follow the example directory setup above). There are two options: either start R interactively, or run it as a script. If do it from an interactive R session, you have to source your script:

source("myscript.R")

The script will run, and the first attempt most likely ends with an error message. You have to correct the error either on your laptop and copy the file over to the server again, or directly on the server, and re-run it again. Note that you don’t have to exit from the R session when copying the files between your laptop and the server. Edit it, copy it over from your laptop (using scp or other tools), and just re-source the file from withing the R session.

Opening a separate R session may be useful for installing packages. For running your scripts, i recommend you to run it entirely from command line, either as

R CMD BATCH myscript.R

or

Rscript myscript.R

The first version produces a little more informative error messages, the other one handles the environment in a little more consistent and efficient manner.

A.4.1 Graphics Output with No GUI

If the server does not have any graphics capabilities, you have to save your figures as files. For instance, to save the image in a pdf file, you may use the following code in your R program:

pdf(file="figure1.pdf", width=12, height=8)
    # width and height in inches
    # check also out jpeg() and png() devices.
# do your plotting here
plot(1:10, rnorm(10))
# done plotting
dev.off()
    # saves the image to disk and closes the file.

afterwards you will have to copy the image file to your laptop for inspection, or addition in your final project report.

A.5 Life on Server

The servers operate in many ways as the command line on your own computer. However, there is a number of differences.

A.5.1 Be Social!

While you laptop is yours, and you are free to exploit all it’s resources for your own good, this is not true for the server. The server is a multiuser system, potentially doing good work for many people at the same time. So the first rule is: Don’t take more resources than what you need!

This that means don’t let the system run, grab memory, or occupy disk space just for fun. Try to keep your R workspace clean (check out rm() function) and close R as soon as it has finished (this happens automatically if you run your script through Rscript from command line). Don’t copy the dataset without a good reason, and keep your copies in a compressed form. R can open gzip and bzip2 files on the fly, so usually you don’t even need to decompress these. Avoid costly recalculations of something you already calculated. All this is even more important the last days before the deadline when many people are running using the server.

Servers are typically well configured to tame misbehaving programs. You may sometimes see your script stopping with a message “killed”. This most likely means that it occupied too much memory, and the system just killed it. Deal with this.

A.5.2 Useful Things to Do

There are several useful commands you can experiment with while on the server.

htop

(press q to quit) tells you which programs run on the server, how much memory and cpu do these take, and who are their owners (the corresponding users). It also permits you to kill your misbehaving processes (press k and select SIGKILL). Read more with man htop.

w

(who) prints the current logged-in users of the server.

df -h

(display free in human-readable units) shows the free and occupied disk space. You are mainly influenced by what is going on in the file system /home.

A.5.3 Permissions and ownership

Unix systems are very strict about ownership and permissions. You are a normal user with limited privileges. In particular, you cannot modify or delete files that you don’t own. In a similar fashion, you cannot kill processes you did not start. Feel free to attempt. It won’t work.

In case you need to do something with elevated privileges (as “superuser”), you have to contact the system administrator. In practice, their responsiveness and willingness to accommodate your requests will vary.

A.5.4 More than One Connection

It perfectly possible to log onto the server through multiple terminals at the same time. You just open several terminals and log onto the server from each of these. You can use one terminal to observe how your script is doing (with htop), the other one to run the script, and the third one to inspect output. If you find such approach useful, I recommend you to familiarize yourself with gnu screen (command screen that includes many related goodies.)

A.6 Advanced Usage

A.6.1 ssh keys, .ssh/config

Without further configuration, every time you open a ssh connection, you have to insert your password again. Instead of re-entering it over and over again (this may not even be secure), you can configure your ssh keys and copy it to the server. Next time, you will be automatically authenticated with the key and no password is necessary. Note: this is the same ssh key that is used by GitHub if you use ssh connection to GitHub.

It is possible to configure ssh connection to use your public key instead of password. (Note: this is the same public key that permits passwordless connection to GitHub.) You first have to create the key with ssh-keygen (you may choose an empty passphrase) unless you already have created one. Thereafter copy your public key to the server with ssh-copy-id. Next time you log onto the server, no password is needed. Look yourself for more information if interested.

Finally, you can also configure your ssh to recognize abbreviated server names and your corresponding user names. Check out how to create ~/.ssh/config file.

A.6.2 Running RScript in ssh Session

Passwordless ssh connection gives you new wonderful possibilities. First, you don’t even have to log into the server explicitly. You can run a one-command ssh session on your server directly from your laptop. Namely, ssh accepts commands to be run on the remote machine. If invoked by something like

ssh myusername@server.ischool.edu "Rscript myscript.R"

It does not open a remote shell but runs Rscript script.R instead. Your command sequence for the whole process will accordingly look something like:

rsync -v Desktop/info201/* myusername@server.ischool.edu:scripts/
ssh myusername@server.ischool.edu "Rscript scripts/myscript.R"
rsync -v myusername@server.ischool.edu:scripts/* Desktop/info201/

All these command are issued on your laptop. You can also save these to a text file and run all three together as a single shell script!

Further, you can also avoid the shell. Instead, you may explain R on our laptop how to start R on the remote server over ssh. In this way you can turn your laptop and server combination into a high-performance-computing cluster! This allows you to copy the script and run it on the server directly from within your R program that runs on your laptop. Cluster computing is out of scope of this book, but if you are interested, look up the makePSOCKcluster() function in parallel package.