Skip to main content

Quickstart Guide for Carya/Opuntia/Sabine

How to log in

The only way to connect to our clusters is by secure shell (ssh), e.g. from a Linux/UNIX system:

ssh -l your_username carya.rcdc.uh.edu 
ssh -l your_username opuntia.rcdc.uh.edu
ssh -l your_username  sabine.rcdc.uh.edu

You will use cougarnet ID as your_username and cougarnet password as password to log in.

Windows users will need an SSH client installed on their machine, see e.g.  PuTTY or XShell.

UH VPN is now mandatory for accessing the clusters from outside the campus network.

Allocations

Users without project allocations cannot run jobs on Sabine or Opuntia. Users have been given a small allocions on Opuntia to continue running jobs there. For increased job allocations please refer to the allocation request. A PI (supervisor) will have to submit a project proposal for Sabine/Opuntia.

Users can check the balance for their projects using the sbalance command:

sbalance balance statement project <projectname>

Users with multiple allocations for instance working for two different PIs(supervisors) would need to specify the allocation upon submission of the job in the batch script so the right PIs allocation is used, e.g. :

#!/bin/bash
### Specify job parameters
#SBATCH -J test_job # name of the job
#SBATCH -t 1:00:00 # time requested
#SBATCH -N 1 -n 2 # total number of nodes and processes

## if you have multiple allocations
## you can tell SLURM which account to charge this job to
#SBATCH -A #Allocation_AWARD_ID

or specify it when submitting an interactive job, e.g.:

srun  -A #Allocation_AWARD_ID --pty  /bin/bash   -l 

Using tmux

Using tmux on the Sabine/Opuntia cluster allows you to create interactive allocations that you can detach from. Normally, if you get an interactive allocation (e.g. srun --pty) then disconnect from the cluster, for example by putting your laptop to sleep, your allocation will be terminated and your job killed. Using tmux, you can detach gracefully and tmux will maintain your allocation. Here is how to do this correctly:

  1. ssh to Sabine/Opuntia.
  2. Start tmux.
  3. Inside your tmux session, submit an interactive job with srun.
  4. Inside your job allocation (on a compute node), start your application (e.g. matlab).
  5. Detach from tmux by typing Ctrl+b then d .
  6. Later, on the same login node, reattach by running tmux attach

Make sure to:

  • run tmux on the login node, NOT on compute nodes
  • run srun inside tmux, not the reverse.

X11 Forwarding

X11 forwarding is necessary to display editor windows (gvim, emacs, nedit, etc.) or similar on your desktop. To enable X11 forwarding, log in with the ssh -X or -Y options enabled

ssh -XY -l your_username  carya.rcdc.uh.edu
ssh -XY -l your_username opuntia.rcdc.uh.edu
ssh -XY -l your_username sabine.rcdc.uh.edu

Windows users need an X server to handle the local display in addition to the ssh program, see  this intro (from the University of Indiana) for PuTTY users.

Transferring Data

Basic Tools

SCP (Secure CoPy): scp uses ssh for data transfer, and uses the same authentication and provides the same security as ssh. For example, copying from a local system to Carya:

scp myfile your_username@carya.rcdc.uh.edu:
scp myfile your_username@opuntia.rcdc.uh.edu:
scp myfile your_username@sabine.rcdc.uh.edu:

To recursively copy directory
scp -r my_directory your_username@opuntia.rcdc.uh.edu:

SFTP (Secure File Transfer Protocol): sftp is a file transfer program, similar to ftp, which performs all operations over an encrypted ssh transport. Example,  put file from local system to Sabine (this also works for Carya and Opuntia):

sftp uusername@sabine.rcdc.uh.edu 
Password: 
Connected to sabine.rcdc.uh.edu 

sftp> put myfile

For Windows users  WinSCP is a free graphical SCP and SFTP client.

RSYNC: rsync is a utility for efficiently transferring and synchronizing files between a computer and an external hard drive and across networked computers by comparing the modification times and sizes of files. Its primary advantage over scp is for fast synchronsization by only copying new or updated files.  To transfer to carya, sabine or opuntia

rsync -avP file username@sabine.rcdc.uh.edu 
rsync -avP directory username@sabine.rcdc.uh.edu 

Software environment

Text editors

Sabine/Opuntia have multiple editors installed including vi and nano.

Modules

Modules are a tool for users to manage the Unix environment in sabine. It is designed to simplify login scripts. A single user command,

module add module_name

can be invoked to source the appropriate environment information within the user’s current shell. Invoking the command,

module available

or use the abbrievated form

module avail

will list the available packages on  Carya, Opuntia or Sabine

module rm module_name

Will remove the module from your environment

Running jobs

The Concept

A "job" refers to a program running on the compute nodes of theCarya, Opuntia or Sabine clusters. Jobs can be run on clusters  in two different ways:

  • A batch job allows you to submit a script that tells the cluster how to run your program. Your program can run for long periods of time in the background, so you don't need to be connected to the cluster. The output of your program is continuously written to an output file that you can view both during and after your program runs.
  • An interactive job allows you to interact with a program by typing input, using a GUI, etc. But if your connection is interrupted, the job will abort. These are best for small, short-running jobs where you need to test out a program, or where you need to use the program's GUI.

The Code

The following shows how to run an example of a parallel program (using MPI) on Carya, Opuntia or Sabine. MPI programs are executed as one or more processes; one process is typically assigned to one physical processor core. All the processes run the exact same program, but by receiving different input they can be made to do different tasks. The most common way to differ the processes is by their rank. Together with the total number of processes, referred to as size, they form the basic method of dividing the tasks between the processes. Getting the rank of a process and the total number of processes is therefore the goal of this example. Furthermore, all MPI related instructions must be issued between MPI_Init() and MPI_Finalize(). Regular C instructions that is to be run locally for each process, e.g. some preprocessing that is equal for all processes, can be run outside the MPI context.

Below is a simple program that, when executed, will make each process print their name and rank as well as the total number of processes.

/*  Basic MPI Example - Hello World  */  
#include <stdio.h> /* printf and BUFSIZ defined there */ 
#include <stdlib.h> /* exit defined there */ 
#include "mpi.h" /* all MPI-2 functions defined there */   

int main(argc, argv) 
int argc; 
char *argv[]; 
{ 
int rank, size, length; 
char name[BUFSIZ];   

MPI_Init(&argc, &argv); 
MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
MPI_Comm_size(MPI_COMM_WORLD, &size); 
MPI_Get_processor_name(name, &length);   

printf("%s: hello world from process %d of %dn", name, rank, size);   

MPI_Finalize();   

exit(0); 
}
    • MPI_Init(); Is responsible for spawning processes and setting up the communication between them. The default communicator (collection of processes) MPI_COMM_WORLD is created.
    • MPI_Finalize(); End the MPI program.
    • MPI_Comm_rank( MPI_COMM_WORLD, &rank ); Returns the rank of the process within the communicator. The rank is used to divide tasks among the processes. The process with rank 0 might get some special task, while the rank of each process might correspond to distinct columns in a matrix, effectively partitioning the matrix between the processes.
    • MPI_Comm_size( MPI_COMM_WORLD, &size ); Returns the total number of processes within the communicator. This can be useful to e.g. know how many columns of a matrix each process will be assigned.
    • MPI_Get_processor_name( name, &length ); Is more of a curiosity than necessary in most programs; it can assure us that our MPI program is indeed running on more than one computer/node.

 

Compile & Run

Save the code in a file named helloworld.c. Load the Intel compiler and Intel MPI module files:

module load intel 	

Compile the program with the following command:

mpicc -o helloworld helloworld.c	

Make a batch job. Add the following in a file named job.sh

#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 2 -n 10

module load intel
mpirun ./helloworld

 

Submit the job to the queue.

sbatch job.sh
Submitted batch job 906

Note that the command sbatch returns the job ID. Note that the example runs fast. It can be finished before the status command returns a job identifier. The job identifier is used to name the output from the job together with the name of the job. The job name is given with -N option in the job.sh-script. In this example it is ‘my_mpi_job’. The standard output from the processes are logged to a log file in the working directory named my_mpi_job.o. Here is the content from on batch execution of the job.sh:

$ cat my_mpi_job.o906 

compute-2-13.local: hello world from process 9 of 10 
compute-2-12.local: hello world from process 1 of 10 
compute-2-12.local: hello world from process 3 of 10 
compute-2-12.local: hello world from process 5 of 10 
compute-2-12.local: hello world from process 6 of 10 
compute-2-12.local: hello world from process 7 of 10 
compute-2-12.local: hello world from process 8 of 10 
compute-2-12.local: hello world from process 0 of 10 
compute-2-12.local: hello world from process 2 of 10 
compute-2-12.local: hello world from process 4 of 10
		

Note that the file my_mpi_job.e contains the output to standard error from all the processes. If the processes are executed without faults, no errors are logged (the file is empty).

Batch Jobs

SLURM Script Generator

The job script generator  URL is  https://wwwdev.times.uh.edu. It is a web GUI application designed to help you in creating batch SLURM scripts, you could use in submitting batch jobs to the clusters. It is a great starting point for most batch job workflows, however, users can customize the output further to suit their needs.
Note, like with the clusters, to use the script generator, you would need to be connected to the UH internal network, possibly using UHVPN, if you connecting from off campus.

Note for Carya there is no special partition for gpus, so " -p gpu" is not needed when submitting jobs.

Users can check the status of a job with the squeue commands below.

$ squeue -j <JOB_ID>

Single Whole node

#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 28  

module load intel  
mpirun ./helloworld
Multiple Whole nodes

This example uses 4 nodes and 28 tasks or cores per node

#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 4 --tasks-per-node=28  

module load intel  
mpirun ./helloworld
Single  core  job utilizing  1 GPU (If you need only a single CPU core and one GPU)
#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 1
#SBATCH -p gpu #needed only on Sabine or Opuntia
#SBATCH --gres=gpu:1
 
module load cuda
nvidia-smi  
mpirun ./helloworld
Single  node job utilizing  1 GPU (If you need only one GPU but with multiple CPUs from same node)
#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 16
#SBATCH -p gpu #needed only on Sabine or Opuntia
#SBATCH --gres=gpu:1  

module load cuda
nvidia-smi  
mpirun ./helloworld
Single  node utilizing  2 GPU  (if you need two gpus, along with multiple cpus all from one node)
#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 28
#SBATCH -p gpu  #needed only on Sabine or Opuntia

#SBATCH --gres=gpu:2
  module load cuda

nvidia-smi   mpirun ./helloworld
Multiple Whole nodes job 2 GPUS per node (only on Sabine). This example uses 4 nodes and 28 tasks or cores per node
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 4 --tasks-per-node=28
#SBATCH -p gpu  #needed only on Sabine or Opuntia

#SBATCH --gres=gpu:2  

module load cuda

nvidia-smi  
mpirun ./helloworld

Interactive Jobs

To open an interactive session on a compute node use the following
 srun  --pty /bin/bash -l
Same as above, but requesting 1 hour of wall time  and  X11 forwarding support
 srun -t 1:00:00 --x11=first --pty /bin/bash -l 
Same as above, but requesting 48 cores or a full node on Carya
 srun -t 1:00:00 -n 48 -N 1 --pty /bin/bash -l 
Requesting 28 cores or a full node on Sabine
 srun -t 1:00:00 -n 28 -N 1 --pty /bin/bash -l 
Same as above, but requesting 20 cores or a full node on Opuntia
 srun -t 1:00:00 -n 20 -N 1 --pty /bin/bash -l
Requesting GPUS

Requesting 24 cores and 1 gpu on Carya
 srun -t 1:00:00 -n 24 --gres=gpu:1 -N 1 --pty /bin/bash -l
Requesting 48 cores or a full node and 2 gpus on Carya
 srun  -t 1:00:00 -n 48 --gres=gpu:2 -N 1 --pty /bin/bash -l
Requesting 20 cores or a full node and 1 gpu on Opuntia
 srun  -t 1:00:00 -n 20 -p gpu --gres=gpu:1 -N 1 --pty /bin/bash -l 

Same as above, but requesting 4 nodes 20 cores per  node (on Opuntia)

 srun -t 1:00:00 -tasks-per-node 20 -N 4 --pty /bin/bash -l
Requesting 28 cores or a full node and 1 gpu on Sabine
 srun -t 1:00:00 -n 28 -p gpu --gres=gpu:1 -N 1 --pty /bin/bash -l 
Requesting 28 cores or a full node and 2 gpus on Sabine
 srun -t 1:00:00 -n 28 -p gpu --gres=gpu:2 -N 1 --pty /bin/bash -l  

Same as above, but requesting 4 nodes 28 cores per  node (on Sabine)

 srun -t 1:00:00 -tasks-per-node 28 -N 4 --pty /bin/bash -l
Requesting 28 cores per node , 2 gpus per node and 4 nodes (on Sabine)
 srun -t 1:00:00 --tasks-per-node=28 -p gpu --gres=gpu:2 -N 4 --pty /bin/bash -l

Tensorflow  Jobs

Tensorflow is available within Anaconda3 or Anacoda2 packages. The installed versions take advantage of gpus.

Note these python examples used here can be found in 

/project/cacds/apps/anaconda3/5.0.1/TensorFlow-Examples/examples/ 

Batch Job Examples

Single  core  job utilizing  1 GPU (If you need only a single CPU core and one GPU)

#!/bin/bash 
#SBATCH -J tensorflow_job
#SBATCH -o tensorflow_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 1
#SBATCH -p gpu  #needed only on Sabine or Opuntia
#SBATCH --gres=gpu:1
#SBATCH --mem=32GB

module load python/3.7
python convolutional_network.py
 

Single node dual core  job utilizing  2 GPUs and 2 CPUs (works only on Sabine) 

#!/bin/bash 
#SBATCH -J tensorflow_job
#SBATCH -o tensorflow_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 2
#SBATCH -p gpu  #needed only on Sabine or Opuntia
#SBATCH --gres=gpu:2
#SBATCH --mem=64GB

module load python/3.7
python convolutional_network.py

GROMACS  Jobs

GROMACS is available as module on the Sabine and Opuntia cluster. The installed versions can also take advantage of gpus.

Batch GROMACS Jobs

Below are more examples for batch jobs requesting certain resources (the module names match the ones installed on Sabine - please adjust for Opuntia). Note maxh set to 4 hours to match requested walltime. 

Single Whole node
#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 
#SBATCH -N 1 -n 28 

module add  GROMACS/2018-intel-2018-GPU-enabled
mpirun mdrun_mpi -v -ntomp 1 \
-deffnm dhfr -maxh 4.0
Single Whole GPU node
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00
#SBATCH -p gpu #needed only on Sabine or Opuntia
#SBATCH -N 1 -n 4 --gres=gpu:tesla:2
module add GROMACS/2018-intel-2018-GPU-enabled
mpirun -mdrun_mpi_gpu -ntomp 7 \
-v -pin on -deffnm dhfr -maxh 4.0
Multiple Whole nodes
#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 
#SBATCH -N 2 -n 56 

module add GROMACS/2018-intel-2018-GPU-enabled
mpirun mdrun_mpi -ntomp 1 -v -pin on\
 -deffnm  dhfr -maxh 4.0

Multiple Whole GPU nodes

#!/bin/bash 
#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00
#SBATCH -p gpu #needed only on Sabine or Opuntia
#SBATCH -N 2 -n 8 --gres=gpu:tesla:2

module
add GROMACS/2018-intel-2018-GPU-enabled
mpirun mdrun_mpi_gpu -ntomp 7 \
-v -pin on -deffnm dhfr -maxh 4.0

NAMD  Jobs

NAMD is available as module on the Sabine and Opuntia cluster. The installed versions can also take advantage of distributed memory processors using MPI.

Batch NAMD Jobs

Below are more examples for batch jobs requesting certain resources (the module names match the ones installed on Opuntia - please adjust for Sabine).  

Single Whole node


#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 
#SBATCH -N 1 -n 20  

module add NAMD
mpirun namd2 namd.conf 

Multiple Whole nodes

#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 
#SBATCH -N 2 -n 40  # Asking for 2 Nodes and 40 cores on Opunta 

module add NAMD
mpirun namd2 namd.conf