Quick slurm on HPC

Jonykoren
4 min readAug 7, 2020

The purpose of this article is to help research students at universities and Universitat Pompeu Fabra (UPF) students in particular, work efficiently, conveniently and quickly with HPC.

HPC, also known as supercomputing, involves thousands of processors working in parallel to analyze billions of pieces of data in real time and has high level of performance as compared to a general-purpose computer.

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained.

First steps

  1. Contact the IT team in the university, and ask for HPC account
  2. Downlaod VPN: Link
  3. Download FileZilla:

4. File → Site Manager, and fill your username (usually the first letter of your name + surname) and password.

Once you’ve connected to the VPN with your account info, you can access the files in FileZilla:

Working with Anaconda

Installation

Create your own environment

Running Jupyter Notebook

After creating Anaconda environment with python and we good to go to run Jupyter , using GPU.

The following line requests from the HPC to run a node using 1 GPU together with 15 GB of CPU memory on bash:

We wait until we get a node — it changes from username@login01:~$ into username@node023:~$

Now you access the node with your request. Put it all together, we can now activate anaconda and run Jupyter:

After getting an address we need to open a new terminal (the left one) without accessing the HPC, and run the line according to the address we got from the first(the right) terminal:

Then, you’re good to go! open your browser and enter to: http://127.0.0.1:8888/tree?

Running CUDA jobs

If you want to run several and long tasks, you should create bash ‘sh’ file.
pay attention to the first line: #!/bin/bash
Its existence ensures this file will run.
The first line starts with # , and those with #SBATCH , are lines that the bash scripts runs. you can have a comment but with space after the #.
(Example: # Hey, this is a comment)

After saving this file (for example: test.sh), and uploading it to a directory (through FileZilla for example), you need to submit it to HPC.
The HPC will evaluate your job, allocate resources according to your requests and other jobs that are already running and will be run by other users.
You can submit it by accessing the bash directory in the command line and run: sbatch test.sh

We can check our submission of the job using:
squeue -u username , this will tell you whether your job started and the running time. Furthermore, we have specified the logs (output and errors) in the bash file, which we can check them out to control the job process.

Running TensorBoard

  1. Make sure that you have installed the same version of tensorflow, tensorflow-gpu on your conda environment
  2. Install TensorBoard on your conda environment: Link
  3. Access (through ‘cd’ command) into the folder that contains the folder of logs & events files are located from training (tfevents and h5 files)
  4. For example, if my folder of the h5 and tfevents called nodule20200720T2023, the command will be:
    tensorboard — logdir=nodule20200720T2023
  5. Afetr getting an address of from the request for running TensorBoard, open a new terminal without connecting to the HPC and run:
    ssh -NL 6006:login01:6006 username@hpc.s.upf.edu
    enter your password, and you are good to go to access the TensorBoard:
    Open your browser and enter to: http://127.0.0.1:6006/

More commands

Check what the directory contains:
ls

Copy-paste from a directory to another:
cp -r path/from/first path/into/second

Watch all users tasks:
squeue

Watch my tasks:
squeue -u username

Cancel a job that submitted
scancel job_id

Running R on Jupyter

In order to run R kernel on Jupyter, you should activate your anaconda environment through the command line and then install IRKernel by the following command:

HPC on Smartphone

In order to use HPC on smartphone and get updated from anywhere and anytime you should install on your smartphone the following apps:

1.FortiClient VPN

FortiClient logo

2.Juice SSH — SSH Client

JuiceSSH logo

Insert your connection details and you’re good to go!

--

--