Skip to content

InformaticsMatters/ansible-galaxy-cloud

Repository files navigation

Ansible playbooks to form a Galaxy/Slurm cloud

lint

GPLv3 license

This repo contains Ansible playbooks for deploying a compute cluster. The primary purpose is to act as a HPC resource for executing Galaxy and Nextflow workflows. The playbooks can be used to spin up nodes on an OpenStack cluster and deploy a Slurm environment to this. You can also deploy Pulsar for executing Galaxy tools and Nextflow for running Nextflow workflows. Both Pulsar and Nextflow use the Slurm cluster for execution. The use of Pulsar allows Galaxy workflows to be executed without the need for a shared filesystem between the Galaxy and Slurm environments.

This project should not be confused with Ansible Galaxy, Ansible's community-driven repository of Ansible Roles.

Creating a complete cluster takes approx 15 mins depending on the number of nodes.

Singularity is deployed to the Slurm cluster allowing jobs to be run using that format of container. Currently Pulsar is not configured to use Singularity containers and instead deploys tool dependencies using conda but we aim to switch to all execution using Singularity in the near future.

Deployment and configuration of Galaxy will be described elsewhere in the near future.

This ansible-galaxy-cloud project contains a site.yaml file and roles for the formation (and removal) of a Nextflow/Slurm/MUNGE/Pulsar cluster, a cluster suitable for Galaxy that consists of a head node and one or more worker nodes that share an NFS-mounted volume attached to the head node (mounted as /data).

As the cloud instances are unlikely to be accessibly from outside the provider network this playbook is expected to be executed from a suitably equipped bastion node on the cloud provider (OpenStack).

Built-in applications

The following applications can be deployed to the cluster, each managed in their own app_ roles: -

Application components (nextflow, slurm, Pulsar) are deployed when their corresponding install variable is set (see group_vars/all/main.yaml). The default in this repository is to enable and install all of them. If you don't want to install a particular component (say nextflow), set its install variable to no, i.e. install_nextflow: no.

Deploying

You need to satisfy a few prerequisites before you can deploy the cluster and applications, as detailed below.

Prerequisite - a bastion node

You can use the bastion project to instantiate a suitable Bastion instance from a workstation outside your cluster. From that instance you should be able to create the cluster using the pre-configured Python virtual environment that it creates there.

Create your bastion and ssh to it. You'll find a clone of this repository in its ~/git directory and you can continue reading this document from there.

The bastion playbook will have also added a parameters file in the project root containing your OpenStack credentials. Use this file to add additional parameter values.

Prerequisite - provider environment

You will need to set provider-specific environment variables before you can run this playbook. If you're using OpenStack you should source the keystone file provided by your stack provider. This sets up the essential credentials to create and access cloud resources.

If you've used the [ansible-bastion] playbook it will have written a suitable set of authentication parameters for you in the root of the initial clone of this project so you need not source anything.

Prerequisite - template environment

Inspect the setenv-template.sh file in the root of the project to see if there are any variables you need to define. Instructions for providing these variables can be found in the template file.

The easiest way to over-ride the built-in values is to provide your own YAML-based parameters file (called parameters). The project parameters file is excluded from the repository using .gitignore. To define your own shared volume size you could provide the following in a parameters file: -

volume_size_g: 3000

...and add the file to your Ansible command-line using -e "@parameters"

The playbooks rely on a number of roles in the project. Where appropriate, each role exposes its key variables in a corresponding defaults/main.yaml file but the main (common) variables have been placed in group_vars/all/main.yaml.

You are encouraged to review all the variables so that you can decide whether you need to provide your own values for any of them.

Prerequisite - playbook variables (OpenStack)

At the very least you should provide your own values for: -

  • instance_base_name. A tag prepended to the cloud objects created (instances and volumes)
  • instance_network. A network name to use.
  • head_address. The IP address (from a pool you own) to assign to the head node (optional).
  • worker_count. The number of worker instances that will be put in the cluster.

Prerequisite - playbook variables (AWS)

At the very least you should provide your own values for: -

  • instance_base_name. A tag prepended to the cloud objects created (instances and volumes)
  • aws_vpc_subnet_id. the ID of your VPC.
  • head_type
  • worker_type
  • volume_device
  • worker_count. The number of worker instances that will be put in the cluster.

Installing public SSH keys

Any .pub files found in the project's root directory will be considered public SSH key-files and they will be added to the centos account of the head node, allowing those users access to it.

Running the playbook

With environment variables set and a parameters file written, run the following on a suitably equipped bastion on your cloud provider: -

$ pip install --upgrade pip
$ pip install -r requirements.txt
$ ansible-galaxy install -r requirements.yml

$ ansible-playbook site.yaml -e "@parameters"

You can avoid formatting the shared volume (for instance if you have an existing non-ext4 volume) by adding volume_initialise: no to your parameters file. By default the volume is expected to be an ext4 volume.

CAUTION: The instance creation process creates instances and volumes whose names begin with the value of the instance_base_name variable. Use a base name that is unique to your cluster or, on deletion, you may find you've lost more than you expected!

You can run separate 'sanity checks' with the site-check playbook: -

$ ansible-playbook site-check.yaml -e "@parameters"

The check ensures that slurm's sinfo command, when run on the head node, does not report any offline nodes and that all the total number of nodes matches your worker_count value.

And, to destroy the cluster: -

$ ansible-playbook unsite.yaml -e "@parameters"

To avoid deleting the shared volume, add volume_delete: no to your parameters file. By default it is deleted when the cluster is deleted.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •