The Kaessmann Lab Computational Infrastructure

The main computing servers are rudi, piggeldy and frederick. The kaessmannserver (OldKaessmannServer) is mostly used for tasks that require GPU resources and for hosting the various apps and websites of the group.

Generally, none of our servers are intended as a replacement for the Helix Cluster. Ideal workloads for our servers are interactive tasks, or tasks that cannot be run on the Cluster for some reason (time limits, missing software, …).

Accessing the computational servers via ssh

Only the kaessmannserver is accesible from outside the university network. For accessing rudi from outside the university campus, please use the university VPN, or log in through the kaessmannserver. Piggeldy and frederick are only accessible from rudi.

rudi: Accessible from the ZMBH network at rudi.bioquant.uni-heidelberg.de
```
# from a local terminal
ssh username@rudi.bioquant.uni-heidelberg.de
```

piggeldy & frederick: Accessible through rudi

# from a terminal on rudi
ssh piggeldy
ssh frederick

kaessmannserver: Accessible from everywhere at ssh.kaessmannlab.org

# from a local terminal
ssh -p 49200 username@ssh.kaessmannlab.org

Important URLs

https://elab.kaessmannlab.org The electronic lab book
https://hkdb.kaessmannlab.org The tissue and sequencing database
https://wiki.kaessmannlab.org The lab wiki

Rudi, Piggeldy & Frederick

User accounts

This section describes the process of registering a new user account on rudi, piggeldy and frederick.

If you don't already have a ssh key pair, you should create one using the following command (the ssh-keygen command will ask for a password, this is not a password for rudi, but rather a password for the ssh key pair, for the case that a third party gains access to the file):
```
ssh-keygen -t rsa -b 4096
# The resulting public key can be found by default at:
# ~/.ssh/id_rsa.pub
```
Send the resulting public key an administrator, who will run the user setup script and install the public key for the newly created user account.

Now, you can connect to rudi using ssh:

ssh username@rudi.bioquant.uni-heidelberg.de

For convenience, it is encouraged to add rudi to the ssh config of your computer. To do this, open or create the file ~/.ssh/config and enter the following, replacing username with your username on rudi:
```
Host rudi
    HostName rudi.bioquant.uni-heidelberg.de
    User username
```
Now you can connect to rudi with this command:
```
ssh rudi
```
To log in from additional devices, you will have to create a ssh key pair for each device. You can then either send the new public keys to an admin, or install it on rudi yourself by appending its content to the ~/.ssh/authorized_keys. file on rudi.

Data storage and software installation

The home directory

Every user has a home directory (/home/username). This folder should only be used for storing configuration files (e.g. .bashrc), not for any downloads, software installations or data. Please use the work directory for these.

The work directory

Every user has a work directory (/work/username). This folder should be used for all software installation, downloads, and data storage. The work directory is accessible from rudi, piggeldy and frederick. Everything locally installed there can be used from all three machines.

To quickly navigate to /work/username you can use the global alias work, which is equivalent to cd /work/username.

Mounting network storage

SDS

The SDS@HD storage can be mounted using the sds script and the uni id (i.e. xy123):

sds xy123

This will create a directory called sds in your work directory. To unmount the sds, use sds_unmount.

Input/output error, failed to unmount: troubleshooting

In some cases, especially when there was an issue with the SDS@HD service, or if the network went offline for a time, the sds mount can fall into a state where it is not usable but can also not be unmounted using the sds_unmount script. Inspecting the folder with ls -la usually looks like this:

d?????????  ? ?      ?              ?            ? sds

and the sds_unmount script fails with the following message:

fusermount: failed to unmount /work/ntrost/sds: Device or resource busy

To solve this issue run the following command in your work folder:

fusermount -zu sds

After this, you can run the sds script again to remount the SDS@HD storage.

kaessmannserver

The home directory of the kaessmannserver can be mounted using the kss script and the username:

kss username

This will create a directory called kss in your work directory To unmount the kaessmannserver, use kss_unmount.

Backing up your data

There is no central scheduled backup for user home or work directories. To back up your important files, you can use rsync to copy your files to the SDS (see above how to mount it the SDS to your work directory). To schedule regular backups you can use cron.

The following example backs up the directory source in the home directory of user to the directory dest on the SDS once a day at 22:00. Please adapt the source and dest folder accordingly.

# open the cron configuration file using the command 'crontab -e'
# this opens the configuration file in your default
# command line editor (e.g. vim).
 
# Insert the following line at the end of the cron configuration file,
# replacing usernames and paths accordingly:
00 22 * * * rsync -hilrtuv --log-file=/home/username/backup.log \
/home/user/source /work/user/sds/sd17d003/user/dest
# upon saving and closing the file, the new cron job will be installed.

Running analyses

Deciding where to run your analyses

Rudi should be used for interactive tasks where CPU speed is relevant. For longer running tasks, you should consider using piggeldy and frederick.

Running CPU intensive tasks

For CPU intensive tasks you should make use of the “nice” system. Prefix commands with nice and a niceness value of > 0 (e.g. 5).

nice -n 5 COMMAND

Python

There is no global python installation on rudi, piggeldy or frederick. To install Python and manage packages, you should use conda environments.

Conda

You are encouraged to use conda to manage your R and python environments on rudi, piggeldy and frederick. Installing conda in the work directory or rudi makes it usable on all three machines after a one time setup on all machines.

# On rudi, download and install miniconda to the work directory:
Miniconda3-latest-Linux-x86_64.sh -p /work/$(whoami)/miniconda3
 
# ssh into frederick and initialize conda there
ssh frederick
/work/$(whoami)/miniconda3/bin/conda init
exit
 
# ssh into piggeldy and initialize conda there
ssh piggeldy
/work/$(whoami)/miniconda3/bin/conda init
exit

Now you can create environments and install packages. The environments are available from all three machines.

Please see the conda documentation for further information on how to install and use conda.

Jupyter

To use Jupyter notebooks interactively, you can run the start_jupyter script. The script will start a jupyter notebook, which can then be accessed using ssh tunneling. The script prints the instructions on how to connect. Make sure to activate a conda environment with the notebook package installed before running start_jupyter.

R

There is no global installation of R on rudi, piggeldy and frederick. To install R, you should use a conda environment. It is also recommended to install R packages directly with conda instead of through R with install.packages().

# For example, to create an environment with R 4.2.2 and Tidyverse, Seurat
# and Bioconductor pre-installed do:
conda create -n r4_env -c conda-forge -c bioconda r-base=4.2.2 \
r-tidyverse r-seurat r-biocmanager bioconductor-BiocGenerics
 
# to install CRAN packages through conda:
conda install -c conda-forge r-packagename
# to install Bioconductor packages through conda:
conda install -c bioconda bioconductor-packagename

You can use the conda package index to find out how to install a specific R package through conda. Just google “conda r packagename”.

RStudio

To use RStudio, you can run the start_rstudio script. Make sure to activate a conda environment with R before running the script. The script will start a containerized RStudio-Server, which can then be accessed using ssh tunneling. The script prints the instructions on how to connect.

To navigate to your work directory in RStudio, press the button with the three dots in the top right corner of the RStudio file browser (e.g. when saving a file, opening a project, in the files panel, …). This will open a text field where you can type /work/<yourusername>/ to go to your work directory and navigate from there using the UI.

Kaessmannserver

Accessing the server for computation

User registration

Creating a new user account has to be done together with the system administrator (Nils). Users will receive a username and password, as well as a home directory that is limited to 1 TB.

After the user account was created, you can log in to the kaessmannserver with your username and password over ssh with:

ssh USERNAME@ssh.kaessmannlab.org

Access from outside of the university network

Access via ssh to kaessmannserver using the default port 22 is restricted to within the network of the university (i.e. only using a wired connection at the office, eduroam or through the Cisco VPN).

Alternatively you can use the non-standard port 49200 to access kaessmannserver via ssh.

The complete ssh command would look like this:

ssh -p 49200 USERNAME@ssh.kaessmannlab.org

Permissions and installations

Users will not get superuser (sudo) rights. Installations should be done locally to the home directory whenever possible. In cases where an installation requires superuser access, please refer to the sysadmin.

SSH key

To make logging in and transferring data more comfortable, users are free to create a private and public key pair:

# on a local terminal
ssh-keygen -t rsa -b 4096
# The resulting public key can be found by default at:
# ~/.ssh/id_rsa.pub
ssh-copy-id -i ~/.ssh/id_rsa.pub USERNAME@ssh.kaessmannlab.org

Mounting the SDS

Simon Anders has created a script to mount the SDS to user home directories. Type sds xy123 (replacing xy123 with your UniID) and it will connect you to the SDS service, i.e., to sd17d003 and similar group shares: You will find a directory ~/sds in your home directory, which links to SDS. To disconnect, use sds_unmount.

Backing up your data

The following example backs up the directory source in the home directory of user to the directory dest on the SDS once a day at 22:00. Please adapt the source and dest folder accordingly.

# open the cron configuration file using the command 'crontab -e'
# this opens the configuration file in your default
# command line editor (e.g. vim).
 
# Insert the following line at the end of the cron configuration file,
# replacing usernames and paths accordingly:
00 22 * * * rsync -hilrtuv --log-file=/home/username/backup.log \
/home/user/source /work/user/sds/sd17d003/user/dest
# upon saving and closing the file, the new cron job will be installed.

Python

There is a system Python3.6 installation. This is however mainly intended for the applications and databases that are hosted on the server. It is recommended to locally install a Anaconda or Miniconda distribution into your home directory. Please see the conda documentation for further information on how to install and use conda.

Jupyter notebook/lab

To use jupyter notebook or jupyter lab on the server you need to use an SSH tunnel. In an ssh session on the server, start the jupyter notebook:

jupyter notebook --no-browser --port=8889

On a local terminal, start the SSH tunnel:

ssh -N -L localhost:8888:localhost:8889 username@ssh.kaessmannlab.org
</code
 
Now you can point your web browser to ''%%localhost:8888%%'' and start using the jupyter notebook. Using Jupyter lab works analogous.
 
Note: You may need to use a different set of ports if you get an error that the ports are already in use. Please use a port in a similar range (between 1025 and 60000). Do not use reserved ports like 22 (ssh), 430 (https), 80 (http), and so on.
 
==== JupyterHub ====
 
The server hosts a JupyterHub, which is a convenient way to use jupyter lab on the server. You can access it at https://jupyter.kaessmannlab.org with your server username and password.
 
By default, only the system python kernel is available. However, you can install your own python kernels (e.g. from conda environments) using the following commands in a terminal:
 
<code bash>
# activate the conda environment that you want to use (in this case myenv)
conda activate myenv
# make sure ipykernel is installed in this environment. If not, do:
conda install ipykernel
# install the kernel for use in jupyterhub:
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

After this setup, the new environment should be available in JupyterHub. You might need to reload, for it to appear.

VS Code remote sessions

Another way to run jupyter notebooks on the server is by using the remote sessions feature of Visual Studio Code. Please see this documentation page for more information.

R

R is installed globally, current version 3.6.3 (2020-02-29). This R installation can be used by everyone. It is recommended to install the R packages locally to avoid clashing versions.

RStudio Server

For interactive R sessions, the server hosts an RStudio server. You can access it at https://rstudio.kaessmannlab.org with your ssh username and password. This RStudio server uses the system R and will stay at version 3.6.3 to maintain compatibility.

Due to licensing limitations with the open source edition of RStudio, it is only possible to open one rstudio session per user at a time.

Please close the RStudio session when you are done with your analyses, to free up memory for other users. To do so, press the red button in the top right of the RStudio page.

What to do if RStudio doesn't load

If you're having trouble logging back in, it might be due to a very large, or corrupted session. Your RStudio session data is stored in ~/.local/share/rstudio/sessions/active/ (in a folder called session-xxxxxxxx, the xs being random numbers and letters). In the case that RStudio loads for ever, or shows an error when trying to log in, you can remove the session-xxxxxxxx folder, or move it out of the ~/.local/share/rstudio/sessions/active/ folder to some other place in your home directory. Once it's no longer in that folder, Rstudio Server will create a new session on the next login, instead of trying to load the previous session.

Another issue that can lead to a non-responsive RStudio session is a long computation or a hang up during a computation. In this case, the Rstudio session is still active, but cannot be accessed again (because RStudio Server is busy). In this case, you can check the PID of your active session using the command rstudio-server active-sessions. To kill your session, use the command rstudio-server kill-session [PID] (replace [PID] with the PID of your active RStudio session). Once you try logging in again, the session will restart.

Using R >4.0 and other R versions

To use versions of R other than the system R, you can create conda environments:

# E.g. R version 4.0.5
# r4_env is the name of the environment and can be chosen freely
conda create -n r4_env -c conda-forge r-base=4.0.5 r-essentials geos r-rgeos

After creating the environment, you can activate it and use the specified R version:

# activate the environment:
conda activate r4_env
# run R
R

Using RStudio with a custom R version

If you want to use RStudio using a custom R from a conda environment, you have to run the following script:

# on the server:
start_rstudio_server.sh

This script will give you a command that creates a SSH tunnel and that you need to execute in a local terminal (on your computer). It should look similar to the following:

ssh -N -L localhost:8888:localhost:12345 username@ssh.kaessmannlab.org
# when using this from outside of the university network, also use the -p 49200 option here, i.e.:
ssh -p 49200 -N -L localhost:8888:localhost:12345 username@ssh.kaessmannlab.org

After you have done that, you can navigate with a web browser to localhost:8888 and start using RStudio.

Note: Using RStudio like this does not circumvent the licensing issue of RStudio open source edition. Still only one RStudio per user at a time is available. If you have an open session using e.g. https://rstudio.kaessmannlab.org when you start RStudio in this way, the older session will be interrupted.

RShiny Server

For hosting shiny apps, the server hosts an RShiny server. However, to add an app to the server, sudo permissions are needed. Contact the sysadmin.

Performing resource intensive tasks

To avoid annoying queue times and maximize CPU usage, NO job management system is currently in place. This also means that computations on the server can only run smoothly if every user does their part.

In general the server is NOT a replacement for the Helix cluster. Very long and very resource intensive tasks should still be performed there. As a rule of thumb: A job that is too intensive to be run on a personal laptop but that takes shorter than the average queuing time on the cluster is a perfect fit for the kaessmannserver.

Another perfect use case is an interactive workload, like a Seurat analysis on large datasets.

For CPU intensive tasks it users should make use of the nice system. Prefix your commands with nice and a niceness value of > 0 (e.g. 5).

nice -n 5 COMMAND

Electronic lab book

Registration in the electronic lab book (https://elab.kaessmannlab.org) is separate from the server access. Users can create their own accounts but have to be verified by an admin.

HKDB

The tissue and sequencing database (https://hkdb.kaessmannlab.org) requires separate user accounts from the servers. Noe is in charge for creating user accounts for the HKDB.

Server Admin Documentation

Documentation of settings and server admin tasks

Table of Contents

The Kaessmann Lab Computational Infrastructure

Accessing the computational servers via ssh

Important URLs

Rudi, Piggeldy & Frederick

User accounts

Data storage and software installation

The home directory

The work directory

Mounting network storage

SDS

Input/output error, failed to unmount: troubleshooting

kaessmannserver

Backing up your data

Running analyses

Deciding where to run your analyses

Running CPU intensive tasks

Python

Conda

Jupyter

R

RStudio

Kaessmannserver

Accessing the server for computation

User registration

Access from outside of the university network

Permissions and installations

SSH key

Mounting the SDS

Backing up your data

Python

Jupyter notebook/lab

VS Code remote sessions

R

RStudio Server

What to do if RStudio doesn't load

Using R >4.0 and other R versions

Using RStudio with a custom R version

RShiny Server

Performing resource intensive tasks

Electronic lab book

HKDB

Server Admin Documentation