The main computing servers are rudi, piggeldy and frederick. The kaessmannserver (OldKaessmannServer) is mostly used for tasks that require GPU resources and for hosting the various apps and websites of the group.
Generally, none of our servers are intended as a replacement for the Helix Cluster. Ideal workloads for our servers are interactive tasks, or tasks that cannot be run on the Cluster for some reason (time limits, missing software, …).
Only the kaessmannserver is accesible from outside the university network. For accessing rudi from outside the university campus, please use the university VPN, or log in through the kaessmannserver. Piggeldy and frederick are only accessible from rudi.
# from a local terminal ssh username@rudi.bioquant.uni-heidelberg.de
# from a terminal on rudi ssh piggeldy ssh frederick
# from a local terminal ssh -p 49200 username@ssh.kaessmannlab.org
This section describes the process of registering a new user account on rudi, piggeldy and frederick.
ssh-keygen -t rsa -b 4096 # The resulting public key can be found by default at: # ~/.ssh/id_rsa.pub
ssh username@rudi.bioquant.uni-heidelberg.de
~/.ssh/config
and enter the following, replacing username
with your username on rudi: Host rudi HostName rudi.bioquant.uni-heidelberg.de User username
Now you can connect to rudi with this command:
ssh rudi
~/.ssh/authorized_keys
. file on rudi.
Every user has a home directory (/home/username
). This folder should only be used for storing configuration files (e.g. .bashrc
), not for any downloads, software installations or data. Please use the work directory for these.
Every user has a work directory (/work/username
). This folder should be used for all software installation, downloads, and data storage. The work directory is accessible from rudi, piggeldy and frederick. Everything locally installed there can be used from all three machines.
To quickly navigate to /work/username
you can use the global alias work
, which is equivalent to cd /work/username
.
The SDS@HD storage can be mounted using the sds script and the uni id (i.e. xy123):
sds xy123
This will create a directory called sds
in your work directory. To unmount the sds, use sds_unmount
.
In some cases, especially when there was an issue with the SDS@HD service, or if the network went offline for a time, the sds mount can fall into a state where it is not usable but can also not be unmounted using the sds_unmount
script. Inspecting the folder with ls -la usually looks like this:
d????????? ? ? ? ? ? sds
and the sds_unmount
script fails with the following message:
fusermount: failed to unmount /work/ntrost/sds: Device or resource busy
To solve this issue run the following command in your work folder:
fusermount -zu sds
After this, you can run the sds
script again to remount the SDS@HD storage.
The home directory of the kaessmannserver can be mounted using the kss script and the username:
kss username
This will create a directory called kss
in your work directory To unmount the kaessmannserver, use kss_unmount
.
There is no central scheduled backup for user home or work directories. To back up your important files, you can use rsync to copy your files to the SDS (see above how to mount it the SDS to your work directory). To schedule regular backups you can use cron.
The following example backs up the directory source
in the home directory of user to the directory dest
on the SDS once a day at 22:00. Please adapt the source
and dest
folder accordingly.
# open the cron configuration file using the command 'crontab -e' # this opens the configuration file in your default # command line editor (e.g. vim). # Insert the following line at the end of the cron configuration file, # replacing usernames and paths accordingly: 00 22 * * * rsync -hilrtuv --log-file=/home/username/backup.log \ /home/user/source /work/user/sds/sd17d003/user/dest # upon saving and closing the file, the new cron job will be installed.
Generally, none of our servers are intended as a replacement for the Helix Cluster. Ideal workloads for our servers are interactive tasks, or tasks that cannot be run on the Cluster for some reason (time limits, missing software, …).
Rudi should be used for interactive tasks where CPU speed is relevant. For longer running tasks, you should consider using piggeldy and frederick.
For CPU intensive tasks you should make use of the “nice” system. Prefix commands with nice and a niceness value of > 0 (e.g. 5).
nice -n 5 COMMAND
There is no global python installation on rudi, piggeldy or frederick. To install Python and manage packages, you should use conda environments.
You are encouraged to use conda to manage your R and python environments on rudi, piggeldy and frederick. Installing conda in the work directory or rudi makes it usable on all three machines after a one time setup on all machines.
# On rudi, download and install miniconda to the work directory: Miniconda3-latest-Linux-x86_64.sh -p /work/$(whoami)/miniconda3 # ssh into frederick and initialize conda there ssh frederick /work/$(whoami)/miniconda3/bin/conda init exit # ssh into piggeldy and initialize conda there ssh piggeldy /work/$(whoami)/miniconda3/bin/conda init exit
Now you can create environments and install packages. The environments are available from all three machines.
Please see the conda documentation for further information on how to install and use conda.
To use Jupyter notebooks interactively, you can run the start_jupyter
script. The script will start a jupyter notebook, which can then be accessed using ssh tunneling. The script prints the instructions on how to connect. Make sure to activate a conda environment with the notebook
package installed before running start_jupyter
.
There is no global installation of R on rudi, piggeldy and frederick. To install R, you should use a conda environment. It is also recommended to install R packages directly with conda instead of through R with install.packages()
.
# For example, to create an environment with R 4.2.2 and Tidyverse, Seurat # and Bioconductor pre-installed do: conda create -n r4_env -c conda-forge -c bioconda r-base=4.2.2 \ r-tidyverse r-seurat r-biocmanager bioconductor-BiocGenerics # to install CRAN packages through conda: conda install -c conda-forge r-packagename # to install Bioconductor packages through conda: conda install -c bioconda bioconductor-packagename
You can use the conda package index to find out how to install a specific R package through conda. Just google “conda r packagename”.
To use RStudio, you can run the start_rstudio
script. Make sure to activate a conda environment with R before running the script. The script will start a containerized RStudio-Server, which can then be accessed using ssh tunneling. The script prints the instructions on how to connect.
To navigate to your work directory in RStudio, press the button with the three dots in the top right corner of the RStudio file browser (e.g. when saving a file, opening a project, in the files panel, …). This will open a text field where you can type /work/<yourusername>/
to go to your work directory and navigate from there using the UI.
Creating a new user account has to be done together with the system administrator (Nils). Users will receive a username and password, as well as a home directory that is limited to 1 TB.
After the user account was created, you can log in to the kaessmannserver with your username and password over ssh with:
ssh USERNAME@ssh.kaessmannlab.org
Access via ssh to kaessmannserver using the default port 22 is restricted to within the network of the university (i.e. only using a wired connection at the office, eduroam or through the Cisco VPN).
Alternatively you can use the non-standard port 49200 to access kaessmannserver via ssh.
The complete ssh command would look like this:
ssh -p 49200 USERNAME@ssh.kaessmannlab.org
Users will not get superuser (sudo) rights. Installations should be done locally to the home directory whenever possible. In cases where an installation requires superuser access, please refer to the sysadmin.
To make logging in and transferring data more comfortable, users are free to create a private and public key pair:
# on a local terminal ssh-keygen -t rsa -b 4096 # The resulting public key can be found by default at: # ~/.ssh/id_rsa.pub ssh-copy-id -i ~/.ssh/id_rsa.pub USERNAME@ssh.kaessmannlab.org
Simon Anders has created a script to mount the SDS to user home directories. Type sds xy123
(replacing xy123
with your UniID) and it will connect you to the SDS service, i.e., to sd17d003 and similar group shares: You will find a directory ~/sds
in your home directory, which links to SDS. To disconnect, use sds_unmount
.
There is no central scheduled backup for user home or work directories. To back up your important files, you can use rsync to copy your files to the SDS (see above how to mount it the SDS to your work directory). To schedule regular backups you can use cron.
The following example backs up the directory source
in the home directory of user to the directory dest
on the SDS once a day at 22:00. Please adapt the source
and dest
folder accordingly.
# open the cron configuration file using the command 'crontab -e' # this opens the configuration file in your default # command line editor (e.g. vim). # Insert the following line at the end of the cron configuration file, # replacing usernames and paths accordingly: 00 22 * * * rsync -hilrtuv --log-file=/home/username/backup.log \ /home/user/source /work/user/sds/sd17d003/user/dest # upon saving and closing the file, the new cron job will be installed.
There is a system Python3.6 installation. This is however mainly intended for the applications and databases that are hosted on the server. It is recommended to locally install a Anaconda or Miniconda distribution into your home directory. Please see the conda documentation for further information on how to install and use conda.
To use jupyter notebook or jupyter lab on the server you need to use an SSH tunnel. In an ssh session on the server, start the jupyter notebook:
jupyter notebook --no-browser --port=8889
On a local terminal, start the SSH tunnel:
ssh -N -L localhost:8888:localhost:8889 username@ssh.kaessmannlab.org </code Now you can point your web browser to ''%%localhost:8888%%'' and start using the jupyter notebook. Using Jupyter lab works analogous. Note: You may need to use a different set of ports if you get an error that the ports are already in use. Please use a port in a similar range (between 1025 and 60000). Do not use reserved ports like 22 (ssh), 430 (https), 80 (http), and so on. ==== JupyterHub ==== The server hosts a JupyterHub, which is a convenient way to use jupyter lab on the server. You can access it at https://jupyter.kaessmannlab.org with your server username and password. By default, only the system python kernel is available. However, you can install your own python kernels (e.g. from conda environments) using the following commands in a terminal: <code bash> # activate the conda environment that you want to use (in this case myenv) conda activate myenv # make sure ipykernel is installed in this environment. If not, do: conda install ipykernel # install the kernel for use in jupyterhub: python -m ipykernel install --user --name myenv --display-name "Python (myenv)"
After this setup, the new environment should be available in JupyterHub. You might need to reload, for it to appear.
Another way to run jupyter notebooks on the server is by using the remote sessions feature of Visual Studio Code. Please see this documentation page for more information.
R is installed globally, current version 3.6.3 (2020-02-29). This R installation can be used by everyone. It is recommended to install the R packages locally to avoid clashing versions.
For interactive R sessions, the server hosts an RStudio server. You can access it at https://rstudio.kaessmannlab.org with your ssh username and password. This RStudio server uses the system R and will stay at version 3.6.3 to maintain compatibility.
Due to licensing limitations with the open source edition of RStudio, it is only possible to open one rstudio session per user at a time.
Please close the RStudio session when you are done with your analyses, to free up memory for other users. To do so, press the red button in the top right of the RStudio page.
If you're having trouble logging back in, it might be due to a very large, or corrupted session. Your RStudio session data is stored in ~/.local/share/rstudio/sessions/active/
(in a folder called session-xxxxxxxx
, the x
s being random numbers and letters). In the case that RStudio loads for ever, or shows an error when trying to log in, you can remove the session-xxxxxxxx
folder, or move it out of the ~/.local/share/rstudio/sessions/active/
folder to some other place in your home directory. Once it's no longer in that folder, Rstudio Server will create a new session on the next login, instead of trying to load the previous session.
Another issue that can lead to a non-responsive RStudio session is a long computation or a hang up during a computation. In this case, the Rstudio session is still active, but cannot be accessed again (because RStudio Server is busy). In this case, you can check the PID of your active session using the command rstudio-server active-sessions
. To kill your session, use the command rstudio-server kill-session [PID]
(replace [PID]
with the PID of your active RStudio session). Once you try logging in again, the session will restart.
To use versions of R other than the system R, you can create conda environments:
# E.g. R version 4.0.5 # r4_env is the name of the environment and can be chosen freely conda create -n r4_env -c conda-forge r-base=4.0.5 r-essentials geos r-rgeos
After creating the environment, you can activate it and use the specified R version:
# activate the environment: conda activate r4_env # run R R
If you want to use RStudio using a custom R from a conda environment, you have to run the following script:
# on the server:
start_rstudio_server.sh
This script will give you a command that creates a SSH tunnel and that you need to execute in a local terminal (on your computer). It should look similar to the following:
ssh -N -L localhost:8888:localhost:12345 username@ssh.kaessmannlab.org # when using this from outside of the university network, also use the -p 49200 option here, i.e.: ssh -p 49200 -N -L localhost:8888:localhost:12345 username@ssh.kaessmannlab.org
After you have done that, you can navigate with a web browser to localhost:8888
and start using RStudio.
Note: Using RStudio like this does not circumvent the licensing issue of RStudio open source edition. Still only one RStudio per user at a time is available. If you have an open session using e.g. https://rstudio.kaessmannlab.org when you start RStudio in this way, the older session will be interrupted.
For hosting shiny apps, the server hosts an RShiny server. However, to add an app to the server, sudo permissions are needed. Contact the sysadmin.
To avoid annoying queue times and maximize CPU usage, NO job management system is currently in place. This also means that computations on the server can only run smoothly if every user does their part.
In general the server is NOT a replacement for the Helix cluster. Very long and very resource intensive tasks should still be performed there. As a rule of thumb: A job that is too intensive to be run on a personal laptop but that takes shorter than the average queuing time on the cluster is a perfect fit for the kaessmannserver.
Another perfect use case is an interactive workload, like a Seurat analysis on large datasets.
For CPU intensive tasks it users should make use of the nice
system. Prefix your commands with nice
and a niceness value of > 0 (e.g. 5).
nice -n 5 COMMAND
Registration in the electronic lab book (https://elab.kaessmannlab.org) is separate from the server access. Users can create their own accounts but have to be verified by an admin.
The tissue and sequencing database (https://hkdb.kaessmannlab.org) requires separate user accounts from the servers. Noe is in charge for creating user accounts for the HKDB.