Skip to content

feat(FirecREST): Adds the slurm-full vCluster, which includes FirecREST as a vService

Context

We want to prove we can support FirecREST on GCP as part of our vCluster deployment

Impact

Adds a new vCluster definition: slurm-full. This vCluster includes all currently available vServices, and it will be updated with all future vServices. Most significantly, this MR adds the FirecREST vService for the first time.

Test(s)

To deploythe Slurm-full vCluster you need to issue the following command:

terraform apply -var 'vclusters=["slurm-full"]'

Nonetheless, keep in mind thera are several steps that you need to perform to prepare the deployment of the vServices included in the vCluster definitions.

Right now, the deployment process involves several not-automated steps. We'll work on making this process more user friendly, but in the mean time, you'll have to move across a couple of repos to reach the point in which you can test the FirecREST API. You can find a detailed description of the deployment process in the vs-firecrest:1:README.md, but I leave a summary here as a reference.

These instructions assume that your local copies of vc-shared-services, vclusters, vs-slurm-simple, and vs-firecrest exist in the same root directory:

  1. Deploy vs-shared-services
cd ./vc-shared-services
terraform workspace select -or-create=true "${USER}"
terraform apply
  1. Build and upload the packages and images required by vs-slurm-simple
cd ../vs-slurm-simple/docker
scp <gaspar_id>@jed.hpc.epfl.ch:"/work/scitas-ge/litchink/swisstwins/slurm-25.05.0/*.tar" .
REGISTRY=$(terraform -chdir=../../vc-shared-services output -json | jq -r '.package_registry_info.value.DOCKER.url')
REPO_NAME=$(terraform -chdir=../../vc-shared-services output -json | jq -r '.package_registry_info.value.YUM.name')
REPO_REGION=$(terraform -chdir=../../vc-shared-services output -json | jq -r '.package_registry_info.value.YUM.location')
gcloud auth configure-docker ${REPO_REGION}-docker.pkg.dev
for i in slurmctld slurmdbd slurmrestd; do
  docker image load --input swisstwins-${i}-25.05.0.tar --
  docker tag registry.hpc.epfl.ch/scitas/swisstwins-${i}:25.05.0 ${REGISTRY}/swisstwins-${i}:25.05.0
  docker push ${REGISTRY}/swisstwins-${i}:25.05.0
done
tar xf opensuse-slurm-repo-25.05.0.tar
for i in opensuse-slurm-rpm-repo-25.05.0/x86_64/*.rpm; do
  gcloud artifacts apt upload ${REPO_NAME} --location ${REPO_REGION} --source "${i}"
done
  1. Build and upload the image required by vs-firecrest
cd ../../vs-firecrest/images/
scp <gaspar_id>@jed.hpc.epfl.ch:"/work/scitas-ge/ggiraldo/swisstwins/firecrest-2.2.8/*.tar" .
docker image load --input swisstwins-firecrest.tar
docker tag europe-west6-docker.pkg.dev/swisstwins/german-docker/swisstwins-firecrest:2.2.8 ${REGISTRY}/swisstwins-firecrest:2.2.8
docker push ${REGISTRY}/swisstwins-firecrest:2.2.8
  1. Forward your local ports to the Keycloak and Nomad services
gcloud container clusters get-credentials "$USER" --location="europe-west6-b"
kubectl -n nomad port-forward services/nomad-ilb 4646:4646 &
kubectl -n directory port-forward services/keycloak 8080:80 &
  1. Deploy this repo with vclusters=["slurm-full"]
cd ../../vclusters/
terraform workspace select -or-create=true "${USER}"
terraform apply -var 'vclusters=["slurm-full"]'
  1. Add user service-account-firecrest-client to LDAP using the adduser.sh script
cd ../vc-shared-services
scripts/ldap/adduser.sh -u service-account-firecrest-client \
   -g <vcluster_name> \
   -f firecrest \
   -l service \
   -m service-account-firecrest-client@epfl.ch \
   -s <path_to_ssh_public_key>

where path_to_ssh_public_key points to the public key associated with the private key provided in locals.firecrest_service_private_key, and vcluster_name is the kubernates namespace associated to the vcluster (typically $USER-slurm-full)

  1. Setup Slurm account and user for your vCluster. First ssh into your login node. You can use gcloud for that purpose:
# Find the name and zone of your vcluster's login node
gcloud compute instances list |grep login
# Use this info to ssh into the node
gcloud compute ssh <login_node> --zone=<login_node_zone>

Once in the node, run these commands:

sudo sacctmgr create account account=<vcluster_name>
sudo sacctmgr create user name=service-account-firecrest-client account=<vcluster_name>

After this, exit the login node by entering ctrl+D

  1. Forward a local port to the FirecREST service
kubectl -n firecrest port-forward services/firecrest 8000:8000 &
  1. Request a JWT from keycloak using the firecrest client credentials
cd ../vclusters/
secret=$(terraform output -json vcluster_slurm_full|jq -r '.client_secret')
token=$(curl --location 'http://localhost:8080/realms/<vcluster_name>/protocol/openid-connect/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'origin: http://localhost:8000' \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode 'client_id=firecrest-client' \
--data-urlencode "client_secret=${secret}" \
--data-urlencode 'scope=openid' | jq -r '.access_token')
  1. Use the JWT to authenticate with FirecREST API and query it's end points
curl -X 'GET'   'http://localhost:8000/status/systems' \
    -H 'accept: application/json' \
    -H "Authorization: Bearer ${token}" | jq

Known issues

  • FirecREST uses S3 storage to support large-file uploads. This feature is currently not available, as Google Cloud Storage is only partially compatible with the AWS S3 libraries used by FirecREST
  • Our vClusters don't count with a shared storage yet. Because of this, access to the login node's file system does not grant FirecREST the ability to upload data to worker nodes.

Links

Edited by German Felipe Giraldo Villa

Merge request reports

Loading