-
Notifications
You must be signed in to change notification settings - Fork 22
add tutorial to connect to flux between clusters #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| .. _command-tutorials: | ||
|
|
||
| Command Tutorials | ||
| ================= | ||
|
|
||
| Welcome to the Command Tutorials! These tutorials should help you to map specific Flux commands | ||
| with your use case, and then see detailed usage. | ||
|
|
||
| - ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a flux instance across clusters using ssh" | ||
|
|
||
| This section is currently 🚧️ under construction 🚧️, so please come back later to see more command tutorials! | ||
|
|
||
|
|
||
| .. toctree:: | ||
| :maxdepth: 2 | ||
| :caption: Command Tutorials | ||
|
|
||
| ssh-across-clusters |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| .. _ssh-across-clusters: | ||
|
|
||
| =================== | ||
| SSH across clusters | ||
| =================== | ||
|
|
||
| Let's say you want to create a Flux instance in an allocation on a cluster (e.g., let's say out first cluster is "noodle") 🍜️ | ||
| and then connect to it via ssh from another cluster (let's say our second cluster is called "quartz"). This is possible with the right | ||
| setup of your ``~/.ssh/config``. | ||
|
|
||
| ---------------------- | ||
| Create a Flux Instance | ||
| ---------------------- | ||
|
|
||
| First, let's create the allocation on the first cluster. We typically want to ask for an allocation, | ||
| and run flux start via our job manager. Here we might be on a login node: | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| # slurm specific | ||
| $ salloc -N4 --exclusive | ||
| $ srun -N4 -n4 --pty --mpibind=off flux start | ||
|
|
||
| And then we get our allocation! You might adapt this command to be more specific to your resource manager. E.g., slurm uses srun. | ||
| After you run flux start, you are inside of a Flux instance on your allocation! | ||
| Let's run a simple job on our allocation. This first example will ask to see the hostnames of your nodes: | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| noodle:~$ flux mini run -N 4 hostname | ||
| noodle220 | ||
| noodle221 | ||
| noodle222 | ||
| noodle223 | ||
|
|
||
| You can sanity check the resources you have within the instance by then running: | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| noodle:~$ flux resource list | ||
| STATE NNODES NCORES NGPUS NODELIST | ||
| free 4 160 0 noodle[220,221,222,223] | ||
| allocated 0 0 0 | ||
| down 0 0 0 | ||
|
|
||
|
|
||
| And you can echo ``$FLUX_URI`` to see the path of the socket that you will also need later: | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| noodle:~$ echo $FLUX_URI | ||
| local:///var/tmp/flux-MLmxy2/local-0 | ||
|
|
||
| We now have defined a goal for success - getting this listing working by running a command | ||
| from a different cluster node. | ||
|
|
||
| ----------------------- | ||
| Connect to the Instance | ||
| ----------------------- | ||
|
|
||
| Next, let's ssh into another cluster. Take the hostname where your instance is running, | ||
| and create a `proxy jump <https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Proxies_and_Jump_Hosts>`_ in your ``~/.ssh/config``: | ||
|
|
||
| .. code-block:: ssh | ||
|
|
||
| Host noodle | ||
| HostName noodle | ||
|
|
||
| Host noodle220 | ||
| hostname noodle220 | ||
| ProxyJump noodle | ||
|
|
||
| .. note:: | ||
|
|
||
| This ``~/.ssh/config`` needs to be written on the cluster system where you are going to connect from. | ||
| In many cases, the shared filesystem could map your home across clusters so you can see the file in | ||
| multiple places. | ||
|
|
||
|
|
||
| You'll first need to tell Flux to use ssh for the proxy command: | ||
|
|
||
| .. code-block:: ssh | ||
|
|
||
| quartz:~$ export FLUX_SSH=ssh | ||
|
|
||
| Next, from this same location, try using ``flux proxy`` to connect to your Flux Instance! Target the URI | ||
| that you found before, ``local:///var/tmp/flux-MLmxy2/local-0``, and add the hostname ``noodle220`` to the address: | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| quartz:~$ flux proxy ssh://noodle220/var/tmp/flux-MLmxy2/local-0 | ||
|
|
||
| If you have trouble - use the force! | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| quartz:~$ flux proxy --force ssh://noodle220/var/tmp/flux-MLmxy2/local-0 | ||
|
|
||
|
|
||
| You should then be able to run the same resource list: | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| quartz:~$ flux resource list | ||
| STATE NNODES NCORES NGPUS NODELIST | ||
| free 4 160 0 noodle[220,221,222,223] | ||
| allocated 0 0 0 | ||
| down 0 0 0 | ||
|
|
||
| Next, try submitting a job! You should be able to see that you are running on the first cluster, | ||
| but from the second. | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| quartz:~$ flux mini run hostname | ||
| noodle220 | ||
|
|
||
| If you are still connected to the first, you should also be able to query the jobs. | ||
| E.g., here we submit a sleep from the second connected cluster: | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| quartz:~$ flux mini submit sleep 60 | ||
| f22hdyb35 | ||
|
|
||
| And then see it from either cluster node! | ||
|
|
||
| .. code-block:: sh | ||
|
|
||
| $ flux jobs | ||
| JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
| f22hdyb35 fluxuser sleep R 1 1 1.842s | ||
|
|
||
| And that's it! With this strategy, it should be easy to interact with Flux instances from | ||
| two resources where ssh is supported. If you have any questions, please `let us know <https://github.com/flux-framework/flux-docs/issues>`_. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Instead of having something like this in every tutorial, perhaps we just need a header/footer comment or something within the "tutorials" page?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or ... if people get to this page via search engines, perhaps it good to have at the bottom.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then if they are in a specific tutorial, they wouldn't see it right? It's important (I think) to be at the bottom of the page so the reader might glance through the tutorial, and then feel like they are missing something / have a question and immediately see that. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,3 +13,4 @@ find a tutorial of interest. | |
|
|
||
| lab/index | ||
| integrations/index | ||
| commands/index | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: to differentiate between login node and host you allocated/are running on do
noodle:~$vsnoodle220:~$at prompts here and below.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were on some general noodle node (that is part of the allocation?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I added a comment at the top that they are on a login node, and then when they hit "noodle" they have their allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning for the prompt change is b/c of your ssh config below.
noodleappears to be the login node, whilenoodle220is the node that you were allocated to run your job. So it may not be clear which node you're actually on with justnoodle:~$??For example, w/ my personal prompt:
you'll notice I was on opal186, the login node, and then
sallocdropped me into a shell onopal63. So if I were to setup the ssh config, I would think I should set it up foropal186andopal63.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it so the login node is just empty (no name) and I explicitly state we are on the login node. Then I state we are on the allocation and just use
noodle:~$to say we are on the allocation (and the specific node largely doesn't matter). I thought it looked nicer without the number so I left it out.