Step-by-step instructions
Here are some step-by-step instructions on various topics and use cases related to machine learning on the HPC cluster of the ZIH. All instructions require a ZIH login and a successful connection to the HPC cluster.
Hello World CNN
This example demonstrates the workflow for recognizing handwritten digits using a CNN. The example can be easily adapted to your own use cases. It is based on the MNIST dataset, which is commonly used for benchmarking and teaching. The example is written in a Jupyter Notebook.
- MNIST Learner Notebookipynb - 12,31 kB
Weitere Dokumente/ Antragsformulare finden Sie hier: https://www.htw-dresden.de/en/university/facultys/info-math/research/smart-production-systems/projects/translate-to-english-kiwi/translate-to-english-schritt-fuer-schritt-anleitungen
1. Download the Jupyter Notebook file
2. Establish a VPN connection to TU Dresden and log in to JupyterHub
5. Open the notebook and execute this step by step
Create your own Python environment with Anaconda
- Notebook Semantic Segmenationipynb - 2,71 kB
Weitere Dokumente/ Antragsformulare finden Sie hier: https://www.htw-dresden.de/en/university/facultys/info-math/research/smart-production-systems/projects/translate-to-english-kiwi/translate-to-english-schritt-fuer-schritt-anleitungen
The following example creates a custom Anaconda environment and installs the fast.ai library. Afterward, a notebook with semantic segmentation can be executed via Jupyter Hub.
1. Establish a VPN connection to TU Dresden and log in to the ZIH login node using a shell.
2. Start an interactive session with the Alpha Centauri cluster:
srun -p alpha-interactive -N 1 -n 1 --mem-per-cpu 11000 -c 2 --time=02:00:00 --pty bash
3. Load the Python package manager Anaconda
module load Anaconda3
4. Create a new Anaconda environment
conda create --name myEnv python=3.7
5. Activate the environment
conda activate myEnv
6. Install the fast.ai library
conda install -c fastai -c pytorch fastai cudatoolkit=11.0.221 ipykernel ipywidgets
7. Register the environment in JupyterHub
python -m ipykernel install --user --name myEnv --display-name="myEnv"
8. Log in to JupyterHub
12. Go through the notebook step by step
Create a workspace
By creating a workspace, you request access to various storage systems of the ZIH, which differ in terms of capacity, streaming bandwidth, IOPS rate, etc. You can't have it all in one system. Therefore, there are different storage systems suitable for different use cases. An overview of the various systems can be found here. The following guide shows how to request, monitor, and delete these storage systems. A comprehensive guide can be found here.
Show available storage systems:
"ws_find -l"
Show currently used workspaces:
"ws_list"
- Create a new workspace
- "ws_allocate -F beegfs myWorkspace 30" Create a new workspace on the "beegfs" storage system with the name "myWorkspace" and a duration of 30 days
Delete workspace
- “ws_release -F beegfs myWorkspace” Delete the workspace “myWorkspace.”
Run Singularity container
- Notebook Question/Answering BERTipynb - 23 kB
Weitere Dokumente/ Antragsformulare finden Sie hier: https://www.htw-dresden.de/en/university/facultys/info-math/research/smart-production-systems/projects/translate-to-english-kiwi/translate-to-english-schritt-fuer-schritt-anleitungen
Containerization encapsulates or packages software code and all its dependencies to run uniformly and consistently on any infrastructure. On ZIH systems, Singularity is used as the standard container solution. Singularity allows users to have full control over their environment. The following example demonstrates how to import a Docker container from the Nvidia NGC Catalog and then run an example within it. The example is a so-called Question/Answering model based on the BERT architecture, pre-trained with the SQuAD dataset. More information on Singularity containers on ZIH systems can be found here.
1. Establish a VPN connection to TU Dresden and log in to the ZIH login node using a shell.
2. Starting a session with the AlphaCentauri cluster
srun -p alpha-interactive -N 1 -n 1 --gres=gpu:1 --mem-per-cpu 11000 -c 6 --time=08:00:00 --pty bash
3. Create a workspace for the container
ws_allocate -F scratch ContainerTest 30
4. Navigate to the workspace directory
cd /scratch/ws/1/[your scratch]/
5. Import the TensorFlow container from Nvidia NGC
singularity build tensorflow.sif docker://nvcr.io/nvidia/tensorflow:22.01-tf1-py3
6. Start the container with a shell (--nv starts the container with Nvidia GPU support)
singularity shell --nv tensorflow.sif
8. Start a JupyterLab server. The port and the token are important. The port will be needed for port forwarding in step 9, and the token will be needed for browser login in step 11.
jupyter lab
9. To access the Jupyter server, a port forwarding from the local PC to the ZIH compute node must be set up.
ssh -fNL <local_port>:<zih_node>:<remote_port> <zih_user>@taurus.hrsk.tu-dresden.de
10. Open a browser and access the Jupyter server using the following address:
localhost:<local_port>
11. Enter the token from step 8
12. Download the Jupyter Notebook file and upload it to Jupyter using the upload button
13. Execute the notebook step by step and be amazed!