GPU Farm

GPU Farm for Research (Phase 3)

HKUCDS GPU Farm for Research (Phase 3)

(The reader of this page is assumed to have read the Quick Start and Advanced Use pages.)

Sessions on this page

Introduction
Usage Examples for running OpenAI API Server with DeepSeek R1

Introduction

HKUCDS GPU Farm for Research (Phase 3) is a SLURM cluster of servers with NVIDIA RTX4090(24GB) and H800 SXM (80GB) GPUs. It is available to staff, PhD and MPhil students of the School of Computing and Data Science.

User account can be applied at https://intranet.cs.hku.hk/gpufarm3_acct/. After your account is created, you may login the gateway node gpu3gate1.cs.hku.hk with SSH:

ssh <your_username>@gpu3gate1.cs.hku.hk

Note: the user accounts and home directories are independent of Phase 1 and 2 and are not shared.

Running a Session with one GPU

The following SLURM partitions are defined in HKUCDS GPU Farm for Research:

Partition	GPU type	No. of GPUs per server	Default CPU cores per GPU	Default Server RAM per GPU	Default Time Limit	Maximum Time Limit	Remarks
debug (default)	RTX4090 (24GB)	2,8,10	4	96GB	6 Hours	7 Days
q-h800	H800 (80GB)	8	4	240GB	6 Hours	2 Days	at most one job only at a time
q-hgpu-batch	H100/H800 (80GB)	8	4	240GB	2 Days	7 Days	sbatch jobs only

After logging on the gateway node, a GPU session can be started with srun, e.g.,

srun --gres=gpu:1 --mail-type=ALL --pty bash

The default SLURM queue (debug) allocates RTX4090 GPUs. 4 CPU cores and 96GB system RAM is allocated with each GPU. To have a session with 2 GPUs:

srun --nodes=1 --gres=gpu:2 --mail-type=ALL --pty bash

By default, each user account can request up to 4 GPUs concurrently. The limit can be raised on request.

Specifying the longer time limit

A job will be terminated when its time limited is reached. Use '-t' to specify a longer time limit than the default. For example, to have a time limit of 12 hours:

srun --nodes=1 --gres=gpu:2 -t 12:00:00 --mail-type=ALL --pty bash

Running a Session with one H800 GPU

To get a session with a H800 GPU, use the q-h800 partition by adding '-p q-h800' in srun or sbatch, e.g.,

srun -p q-h800 --gres=gpu:1 --mail-type=ALL --pty bash

4 CPU cores and 240GB system RAM is allocated with each H800 GPU.

Running a Session with 2 H800 GPUs

srun -p q-h800 --nodes=1 --gres=gpu:2 --mail-type=ALL --pty bash

Submitting Batch Jobs

If you program runs for days and does not require user interaction during execution, you can submit it to the system in a batch mode. The system will schedule your job to run when the requested GPUs are available.

To submit a batch job from the gateway node gpu3gate1,

Create a batch file, e.g., my-gpu-batch, with the following contents:
#!/bin/bash # Tell the system the resources you need. Adjust the numbers according to your need
# specify the partition to use and GPUs needed with -p and --gres optons, e.g. # '--gres=gpu:4' for four RTX4090 GPUs
# '-p q-hgpu-batch --gres=gpu:2' for two H100 or H800 GPUs # '-p q-hgpu-batch --gres=gpu:h100:2 for two H100 GPUs
# '-p q-hgpu-batch --gres=gpu:h800:4 for four H800 GPUs #SBATCH --nodes=1 --gres=gpu:4 --mail-type=ALL# Specify a time limit if needed, e.g., 4 days #SBATCH -t 4-00:00:00#If you use Anaconda, initialize it . $HOME/anaconda3/etc/profile.d/conda.sh conda activate my_env # cd your your desired directory and execute your program, e.g. cd _to_your_directory_you_need _run_your_program_
Submit your batch job to the system with the following command on a .:
sbatch my-gpu-batch

Usage Examples with DeekSeek R1 models

The following examples show how to install and run an OpenAI API server with DeepSeek R1 locally in the GPU farm. Distilled models of DeekSeek R1 are downloaded in /share/deepseek-ai for convenience.

Installing SGLang

SGLang is a framework for large language models and vision language models. An OpenAI-compatible APIs server is included. The following steps assume that Anaconda is installed.

On gpu3gate1.cs.hku.hk, request a GPU session
```
srun --gres=gpu:1 --mail-type=ALL --pty bash
```
On the GPU node, create a new conda enviroment:
```
conda create -n deekseek python=3.10
```
Activate the environment:
```
conda activate deekseek
```

Install SGLang (ref: https://docs.sglang.ai/start/install.html)

pip install "sglang[all]>=0.4.4.post1" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python

Logout the session. The conda environment will be used for running the API server (see below).

Running DeepSeek R1 models with one RTX4090 GPU

Smaller distilled models of DeepSeek R1, e.g., DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B can run with one RTX4090 GPU.

On gpu3gate1.cs.hku.hk, request a GPU session
```
srun --gres=gpu:1 --mail-type=ALL --pty bash
```
And note the hostname of the GPU server assigned, e.g., gpu-4090-201, either from the command prompt, or using the 'hostname' command.
Activate the conda environment that have SGLang installed in the previous session:
```
conda activate deekseek
```
Start the server using one DeepSeek-R1-Distill-Qwen-7B for example:
```
python3 -m sglang.launch_server --served-model-name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
--model-path /share/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --trust-remote-code
```
After the SGLang server is started up, a message will be shown that it is running on http://127.0.0.1:30000.
Open a new terminal on your local computer. Login the same GPU server, using the hostname you noted in step 1:
```
ssh <your_usenname>@gpu-4090-201.cs.hku.hk
```
On this new SSH session, query the model name:
```
curl http://127.0.0.1:30000/v1/models
```
The id of the model should be the same as the --served-model-name parameter in the previous step.

Asked the server a question, e.g.,

curl http://localhost:30000/v1/completions -H "Content-Type: application/json" \
-d '{ "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", "prompt": "Who are you?", "max_tokens": 1024, "temperature": 0.6 }'

Running DeekSeek R1 models with one H800 GPU

DeepSeek-R1-Distill-Qwen-32B cannot fit in an RTX4090, but can run in a single H800.

On gpu3gate1.cs.hku.hk, request a H800 GPU session
```
srun -p q-h800 --gres=gpu:h800:1 --mail-type=ALL --pty bash
```
And note the hostname of the GPU server assigned, e.g., gpucluster-g1, either from the command prompt, or using the 'hostname' command.
Activate the conda environment that have SGLang installed in the previous session:
```
conda activate deekseek
```

Start the server:

python3 -m sglang.launch_server --served-model-name deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--model-path /share/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code

After the SGLang server is started up, a message will be shown that it is running on http://127.0.0.1:30000.

Open a new terminal on your local computer. Login the same GPU server, using the hostname you noted in step 1:
```
ssh <your_usenname>@gpucluster-g1.cs.hku.hk
```
On this new SSH session, query the model name:
```
curl http://127.0.0.1:30000/v1/models
```
The id of the model should be the same as the --served-model-name parameter in the previous step.

Asked the server a question, e.g.,

curl http://localhost:30000/v1/completions -H "Content-Type: application/json" \
-d '{ "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B", "prompt": "List some interesting facts in Mathematics about the number 2025", "max_tokens": 1024, "temperature": 0.6 }'

Running Large DeekSeek R1 models with multiple H800 GPU

DeepSeek-R1-Distill-Llama-70B needs two H800 GPUs to run.

On gpu3gate1.cs.hku.hk, request a session with two H800 GPUs
```
srun -p q-h800 --gres=gpu:h800:2 --mail-type=ALL --pty bash
```
And note the hostname of the GPU server assigned, e.g., gpucluster-g1, either from the command prompt, or using the 'hostname' command.
Activate the conda environment that have SGLang installed in the previous session:
```
conda activate deekseek
```

Start the server with 2 GPUs (--tp 2):

python3 -m sglang.launch_server --served-model-name deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--model-path /share/deepseek-ai/DeepSeek-R1-Distill-Llama-70B --trust-remote-code --tp 2

After the SGLang server is started up, a message will be shown that it is running on http://127.0.0.1:30000.

Open a new terminal on your local computer. Login the same GPU server, using the hostname you noted in step 1:
```
ssh <your_usenname>@gpucluster-g1.cs.hku.hk
```
On this new SSH session, query the model name:
```
curl http://127.0.0.1:30000/v1/models
```
The id of the model should be the same as the --served-model-name parameter in the previous step.

Asked the server a question, e.g.,

curl http://localhost:30000/v1/completions -H "Content-Type: application/json" \
-d '{ "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B", "prompt": "Write a python program to display Hello World.", "max_tokens": 1024, "temperature": 0.6 }'

Using the HKU CS GPU Farm (Advanced)

Sessions on this page:

Using SLURM Native srun Command to Allocate More time and GPUs in an Interactive Session
Running Batch Jobs
Using RTX3090 GPUs

Allocating More Time and GPUs in an Interactive Session

If you examine the gpu-interactive command script in /usr/local/bin on a gateway node, you will find that it calls the srun command of the SLURM system to do the session allocation actutally:

srun --gres=gpu:1 --pty --mail-type=ALL bash

The default time limit of a session is 6 hours, i.e., all processes started in a session will be terminated after 6 hours even if the user does not logout himself. To have a longer time limit, e.g., 12 hours, use the --time option:

srun --gres=gpu:1 --time=12:00:00 --pty --mail-type=ALL bash

The maximum time limit for an interactive session is 18 hours. If more time is needed, use sbatch (see the session below).

The GPU farm is configured to allocate 4 CPU cores and 40GB system (not GPU) RAM per GPU. To have more GPUs, CPU cores and RAM allocated, use the srun command with appropiate --gres=gpu and --cpus-per-task parameters, e.g.:

srun --nodes=1 --gres=gpu:2 --pty --mail-type=ALL bash

The above command requests a session with 2 GPUs and implicitly 8 CPU cores and 80GB system RAM in 1 node(server). GPU and CPU time quotas will be deducted accordingly.

Our RTX4080 servers are currently set up to support a maximum of 4GPUs, 16 CPU cores and 160GB RAM in a single session.

To prevent users from occuping too many resources and exhausting their quotas unintentionally, each users is limited to have 2 GPUs and 8 CPUs concurrently. The limit is should be sufficient for most AI assignments. Users who need more concurrent resources may contact support@cs.hku.hk with support reasons.

Please make sure that the software that you use supports multiple GPUs before requesting more than one GPU in a session. Otherwise your time quota will be wasted.

Running Batch Jobs

If your program does not require user interaction during execution, you can submit it to the system in a batch mode. The system will schedule your job in background. You do not need to keep a terminal session on a gateway node to wait for the output. A SLURM partition/queue batch is set up with a maximum time limit of 7 days. To submit a batch job,

Create a batch file, e.g., my-gpu-batch, with the following contents:
#!/bin/bash # Tell the system the resources you need. Adjust the numbers according to your need, e.g. # -p batch - use partition/queue named batch # --time=24:00:00 - set a time limit of 24 hours #SBATCH -p batch --nodes=1 --gres=gpu:1 --cpus-per-task=4 --time=24:00:00 --mail-type=ALL#If you use Anaconda, initialize it . $HOME/anaconda3/etc/profile.d/conda.sh conda activate tensorflow # cd your your desired directory and execute your program, e.g. cd _to_your_directory_you_need _run_your_program_
On a gateway node (gpu2gate1 or gpu2gate2), submit your batch job to the system with the following command on a .:
sbatch my-gpu-batch

Note the job id displayed. The output of your program will be saved in slurm-<job id>.out

A mail will be sent to you when your job starts and ends.

Use "squeue -u $USER" to see the status of your jobs in the system queue.

To cancel a job note the job ids of "squeue -u $USER" and use 'scancel <job id>' to cancel a job.

The concurrent CPU and GPU limits (2 GPU/8 CPU) also apply to batch jobs. If you need to run multiple batch jobs concurrently. Please contact support@cs.hku.hk for a temperorary arrangement of increasing your concurrent limit.

Using RTX3090 GPUs

For users who needs more GPU and system memory, a small number of RTX3090 GPUs with 24GB GPU memory, connected in pairs with NVLink bridges, are available.

To start a session with RTX3090, use the command line option '-p q-3090' with srun and sbatch commands. For example to request an interactive session with one RTX3090:

srun -p q-3090 --gres=gpu:1 --pty --mail-type=ALL bash

The session with one 3090 GPU, 8 CPU cores and 112GB system RAM.

To request a session with 2 RTX3090 GPUs connected with a NVLink bridge (and 16 CPU cores and 224GB RAM implicitly):

srun -p q-3090 --gres=gpu:2 --pty --mail-type=ALL bash

The default time and maximum time limits of an RTX3090 session are also 6 hours and 18 hours. To have a longer session, use '-p q-3090-batch' with sbatch. For example, the following #SBATCH directive instructs the sbatch command to run a job in a 24 hour session, and a reminder email will be sent when 80% of the time limit is reached:

#!/bin/bash
#SBATCH -p q-3090 --gres=gpu:2 -t 24:00:00 --mail-type=ALL,TIME_LIMIT_80
your_script_starts_here

Further Information

Please visit the official site of the SLURM Workload Manager for further documentation on using SLURM.

Using the HKU CS GPU Farm (Quick Start)

Applying for an Account

GPU Farm for Teaching (Phase 2) Accounts

Members of the School of Computing and Data Science and students taking designated courses offered by the School are eligible to use GPU Farm for Teachng. Please visit https://intranet.cs.hku.hk/gpufarm_acct_cas/ for application.

The usename of the account will be the same as your HKU Portal ID. A new password will be set for the account. An email will be sent to you after your account is created.

GPU Farm for Research (Phase 3) Accounts

The following users are also eligible to use GPU Farm Phase 3:

staff of the School of Computing and Data Science;
PhD and MPhil students of the School of Computing and Data Science;

Please visit https://intranet.cs.hku.hk/gpufarm3_acct/ for application and https://www.cs.hku.hk/gpu-farm/gpu-farm-for-research for usage information.

Accessing the GPU Farms

To access the GPU farm, you need to be connected to HKUVPN (from either the Internet or HKU Wifi), or to the HKU CDS wired network (CDS laboratories and offices). Use SSH to connect to one of the gateway nodes:

Gateway nodes for GPU Farm Phase 2: gpu2gate1.cs.hku.hk or gpu2gate2.cs.hku.hk
Gateway nodes for GPU Farm Phase 3: gpu3gate1.cs.hku.hk

Note: The accounts and home directories of the the 2 phases of GPU farm are separate from each other and are not shared.

Login with your username and password (or your SSH key if you have uploaded your public key during account application), e.g.:

For phase 2:

ssh <your_portal_id>@gpu2gate1.cs.hku.hk

For phase 3:

ssh <your_portal_id>@gpu3gate1.cs.hku.hk

Note: Users of Linux, including WSL2, may add "-X' as command line option to enable of X11 forwarding.

These gateway nodes provide access to the actual GPU compute nodes of the farm. You can also transfer data to your home directory of the GPU farm by using SFTP to these the gateway nodes.

To facilitate X11 forwarding in interactive mode on the GPU compute nodes, an SSH key pair (id_rsa and id_rsa.pub) and the authorized_keys file are generated in the ~/.ssh directory when your GPU farm account is created. You are free to replace the key pair with your own one, and add your own public keys to the ~/.ssh/authorized_keys file.

Using GPUs in Interactive Mode

After logging on a gateway node, you can now login to a computation node with actual GPUs attached. To have an interactive session, use the gpu-interactive command to run a bash shell on a GPU node. An available GPU compute node will be selected and allocated to you, and you will be logged on the node automatically. The GPU compute nodes are named gpu-xxxx-yy in Phase 2. Note the change of host name in the command prompt when you actually log on to a GPU node, e.g.,

tmchan@gpu2gate1:~$ gpu-interactive 
tmchan@gpu-4080-103:~$

You can verify that a GPU is allocated to you with the nvidia-smi command, e.g.:

tmchan@gpu-408-103:~$ nvidia-smi

Fri Jun 20 16:45:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:06:00.0 Off |                  N/A |
| 38%   32C    P8              6W /  320W |       2MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

With the gpu-interactive command, 1 GPU, 4 CPU cores and 40 GB RAM are allocate to you.

You can now install and run software like using a normal Linux server.

Note that you do not have sudo privileges. Do not use commands such as 'sudo pip' or 'sudo apt-get' to install software.

The time limits (quotas) for GPU and CPU time start counting once you have logged on to a GPU compute node, until you logout the GPU compute node:

tmchan@gpu-4080-103:~$ exit
tmchan@gpu2gate1:~$

All your processes running on the GPU node will be terminated when you exit from gpu-interactive command.

Accessing Your Session with Another Terminal

After you are allocated a GPU compute node with gpu-interactive, you may access the same node with another SSH session. What you need is the actual IP address of the GPU compute node you are in. Run 'hostname -I' on the GPU compute to node find out its IP address. The output will be an IP address 10.XXX.XXX.XXX, e.g.,

tmchan@gpu2gate1:~$ hostname -I 10.21.5.225

Then using another terminal on your local desktop/notebook, SSH to this IP address:

ssh -X <your_cs_username>@10.XXX.XXX.XXX

These additional SSH sessions will terminate when you exit the gpu-interactive command.

Note: Do not use more than one gpu-interactive (or srun) at the same time if you just want to access your current GPU session from a second terminal, since those commands will start a new independent session and allocate an additional GPU to you, i.e., your GPU time quota will be doubly deducted. Also, you cannot access the GPUs of your previous sessions.

Software Installation

After logging on a GPU compute node using the gpu-interactive command (or SLURM's native srun command), you can install software, such as Anaconda, for using GPUs into your home directory,

GPU driver software is pre-installed on all GPU compute nodes. (Note: If you software reported that no GPU is detected, it is probably because you are still on the gateway node and have not logged on a GPU node yet.)

Many software packages for AI and machine learning can be installed as a orginary user without sudo previleges. You may install software on your account with package managers such as Anaconda or pip, or by compiling from the soruce code. Running 'sudo and 'apt' are not supported.

Examples

Below are some example steps of software installation:

Note: make sure you are in a GPU compute node (the host name of prompt shows gpu-comp-x) before installing and running your software.

Anaconda (including Jupyter)

# if you are on a gateway node, login a GPU node first
gpu-interactive
# download installer, check for latest version from www.anaconda.com
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
# run the installer,
# and allow the installer to update your shell profile (.bashrc) to automatically initialize conda
bash Anaconda3-2024.10-1-Linux-x86_64.sh
# logout and login the GPU node again to activate the change in .bashrc 
exit 
# run gpu-interactive again when you are back to the gateway node 
gpu-interactive

Note: if you have chosen not to allow the Anaconda installer to update your .bashrc, the 'conda' command will not be available. It can be fixed by running

~/anaconda3/bin/conda init

then logout and re-login

Install Pytorch in a dedicated Conda environment

# If you are on a gateway node, login a GPU node first
gpu-interactive

# Create a new environment, you may change python version 3.11 to other versions if needed
conda create -n my_env python=3.11

# Activate the new environment
conda activate my_env

#Then use a web browser to visit https://pytorch.org/. Scroll down to the INSTALL PYTORCH session, select
#Your OS: Linux
#Package: Conda
#Language: Python
#Compute Platform: CUDA 11.x or CUDA 12.x
#Then run the command displayed in Run this command, e.g.,
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Install Jupyter Kernel for an environment

# Suppose your conda environment is named my_env. To use Jupyter Lab within the environment, install iPython kernel
conda activate my_env
conda install ipykernel
ipython kernel install --user --name=kernel_for_my_env
# restart Jupyter and then choose kernel_for_my_env in Jypyter Lab

Using CUDA

When you install common tools such as PyTorch or TensorFlow, the installation instruction includes steps that installs supporting CUDA runtime libraries. Usually there is no need to install or compile CUDA toolkit separately.

In case a separate CUDA toolkit is needed, it is available in /usr/local/cuda of all GPU nodes. To avoild conflict, CUDA is not added to the PATH variable of user accounts by default. If you need to develop with CUDA (e.g., using nvcc), you can add the following line at the your ~/.bashrc, e.g.

PATH=/usr/local/cuda/bin:$PATH

Running Jupyter Lab without Starting a Web Browser

Running jyputer-lab starts a web browser by default. While it is convenient when the software is run on a local computer, running a web browser on a compute node of the GPU farm not only consumes the memory and CPU power of the your session, the responsiveness of the web browser will also degrade, especially you are connecting remotely from outside of HKU. We recommend running jyputer-lab on a GPU compute node without starting a web browser, and access it with the web browser of your local computer. The steps below shows the way to do it:

Login a GPU compute node from a gateway node with gpu-interactive:
gpu-interactive
Find out the IP address of the GPU compute node:
hostname -I
(The output will be an IP address 10.XXX.XXX.XXX)
Start Jupyter Lab with the --no-browser option and note the URL displayed at the end of the output:
jupyter-lab --no-browser --FileContentsManager.delete_to_trash=False
The output will look like something below:
...
Or copy and paste one of these URLs:
http://localhost:8888/?token=b92a856c2142a8c52efb0d7b8423786d2cca3993359982f1

Note the actual port no. of the URL. It may sometimes be 8889, 8890, or 8891, etc.
On your local desktop/notebook computer, start another terminal and run SSH with port forwarding to the IP address you obtained in step 2:
ssh -L 8888:localhost:8888 <your_gpu_acct_username>@10.XXX.XXX.XXX
(Change 8888 to the actual port no. you saw in step 3.)
Notes:
1. The ssh command in this step should be run on your local computer. Do not login the gateway node.
2. If you see an error like the following:
bind [127.0.0.1]:7860: Address already in use
channel_setup_fwd_listener_tcpip: cannot listen to port: 7860
Could not request local forwarding.
you may have a Jypyter Lab instance running on your local computer. Close your local Jupyter Lab instance, and restart the ssh command in this step. If in doubt, restart your local computer.
On your local desktop/notebook computer, start a web browser. Copy the URL from step 3 to it.

Remember to shutdown your Jupyter Lab instance and quit your gpu-interacive session after use. Leaving a Jupyter Lab instance idle on a GPU node will exhaust your GPU time quota.

Notes on file deletion: If you start Jupyter Lab without --FileContentsManager.delete_to_trash=False, files deleted with Jupyter Lab, will be moved to Trash Bin (~/.local/share/Trash) instead of deleted actually. Your disk quota may be used up by the Trash Bin eventally. To empty trash and release the disk space used, use the following command:

rm -rf ~/.local/share/Trash/*

Using tmux for Unstable Network Connections

To avoid disconnection due to unstable Wifi or VPN, you may use the tmux command, which can keep a terminal session running even when disconnected, on gpu2gate1 or gpu2gate2.

Note that tmux should be run on gpu2gate1 or gpu2gate2. Do not run tmux on a GPU node after running gpu-interative or srun. All tmux sessions on a GPU node will still be terminiated when you gpu-interative/srun session ends.

There are many on-line tutorials on the web showing how to use this command, e.g.,
https://medium.com/actualize-network/a-minimalist-guide-to-tmux-13675fb160fa

Please see these pages for details, especially on the use of detach and attach functions.

Cleaning up files to Free up Disk Space

The following guidelines help to free up disk space when you are running out of disk quota:

Emptying Trash. When you run Jupyter Lab without --FileContentsManager.delete_to_trash=False option, or other GUI to manipulate files, files you try to delete will be moved to Trash Bin (~/.local/share/Trash) instead of deleted actually. To empty trash and release the disk space used, use the following command:
rm -rf ~/.local/share/Trash/*
Remove installation files after sofware installation. For example, the Anaconda installation file Anaconda3-20*.*-Linux-x86_64.sh, which has a size of over 500MB, can be deleted after installation. Also check if you have downloaded the files multiple times and delete redundent copies.
Clean up conda installation packages. Run the following command to remove conda installation files cached in your home directory:
conda clean --tarballs
Clean up cached pip installation packages:
pip cache purge
Clean up intermediate files generated from the software packages you are using. Study the documentations of individual packages for details.

Further Information

See the Advanced Use page for information on using multiple GPU in a single session and running batch jobs, and the official site of the SLURM Workload Manager for further documentation on using SLURM.

You may also contact support@cs.hku.hk for other questions.

GPU Farm Usage Policies and Guidelines

GPU Farm Accounting Policy

All GPU Farm resources are for access by registered users only. Registered users have to observe the usage policies, and are responsible for taking reasonable precautions to to protect their own GPU Farm account from unauthorized access. Users are not allowed to share their GPU Farm account(s) or credentials with others.

GPU Farm Usage Policy

GPU Farm resources are shared by multiple users. So your actions can have serious impacts on the GPU Farm and can affect other users. The following policies are set up to ensure a fair use of the computing resources and to prevent unauthorized or malicious use. Jobs will be terminated if any misuse of the system is identified. The related accounts will be suspended.

The use of the GPU Farm is restricted for academic and research purposes only. It must not be used for any commercial activities, consultancy services, or creating any nuclear, biological or chemical weapons or military products capable of delivering weapons. It must not be used for cryptocurrency mining.
Users are responsible for using the GPU Farm resources in an efficient, effective, ethical and lawful manner.
Access to the GPU Farm is to be via secure communication channel (e.g. ssh) to the master node virtual machines and compute node virtual machines.
Master node virtual machines are used for interactive work required to prepare computational tasks, as well as pre-processing and post-processing works such as source code editing, compilation, job submission and data analysis. They should not be used for running computational intensive jobs.
Users are requested to report any computer security loopholes of the systems, and incidents of suspected misuse or violation of the accounting polices to the Technical Staff of Department of Computer Science.
Users must not attempt to access any data or programs on the GPU Farm for which they do not have authorization or explicit consent of the owner of the data/program.
Users must not download, install or run security programs or utilities which reveal security weaknesses of the GPU Farm.
Software to be installed on the GPU Farm must include a valid license (if applicable). User must not install any software to the GPU Farm without prior proof of license eligibility.

These policies will be reviewed and amended periodically.

Getting Help

For any questions regarding these usage policies or resources misuse report, please send an email to support@cs.hku.hk.

HKU CS GPU Farm

The School of Computing and Data Science of HKU currently has two GPU farms for students and research staff of the School.

GPU Farm for Teaching (formerly Phase 2)

The GPU Farm for Teaching is a computing cluster equipped with 200 NVIDIA GeForce RTX 4080 Super GPUs, complemented by RTX 3090 GPUs connected via NVLink bridges to support applications requiring expanded GPU memory. This serves as a platform for MSc and undergraduate courses, as well as student projects, enabling hands-on experience with current AI technologies.

GPU Farm for Research (formerly Phase 3)

The GPU Farm for Research is equipped with NVIDIA enterprise-grade H-series and RTX 4090 GPUs, providing high-performance capabilities for demanding AI workloads. It serves as a dedicated resource for CDS faculty, PhD and MPhil students, enabling cutting-edge research in machine learning, data science, and other computational disciplines.

Both computer clusters are managed by SLURM cluster management and job scheduling system. Compute or data intensive software, such as PyTorch, Tensorflow and other LLM and tools, can be run on these platforms.

If you have any feedback or query please click here.

GPU Farm

GPU Farm for Research (Phase 3)

HKUCDS GPU Farm for Research (Phase 3)

Sessions on this page

Introduction

Running a Session with one GPU

Specifying the longer time limit

Running a Session with one H800 GPU

Running a Session with 2 H800 GPUs

Submitting Batch Jobs

Usage Examples with DeekSeek R1 models

Installing SGLang

Running DeepSeek R1 models with one RTX4090 GPU

Running DeekSeek R1 models with one H800 GPU

Running Large DeekSeek R1 models with multiple H800 GPU

Using the HKU CS GPU Farm (Advanced)

Sessions on this page:

Allocating More Time and GPUs in an Interactive Session

Running Batch Jobs

Using RTX3090 GPUs

Further Information

Using the HKU CS GPU Farm (Quick Start)

Applying for an Account

GPU Farm for Teaching (Phase 2) Accounts

GPU Farm for Research (Phase 3) Accounts

Accessing the GPU Farms

Using GPUs in Interactive Mode

Accessing Your Session with Another Terminal

Software Installation

Examples

Anaconda (including Jupyter)

Install Pytorch in a dedicated Conda environment

Install Jupyter Kernel for an environment

Using CUDA

Running Jupyter Lab without Starting a Web Browser

Using tmux for Unstable Network Connections

Cleaning up files to Free up Disk Space

Further Information

GPU Farm Usage Policies and Guidelines

GPU Farm Accounting Policy

GPU Farm Usage Policy

Getting Help

HKU CS GPU Farm

GPU Farm for Teaching (formerly Phase 2)

GPU Farm for Research (formerly Phase 3)

Sign in to your account