Skip to content

Commit d167514

Browse files
committed
tutorials: add flux proxy command tutorial
Problem: There's no flux proxy command tutorial. Add one.
1 parent 93dab21 commit d167514

File tree

2 files changed

+231
-0
lines changed

2 files changed

+231
-0
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
.. _flux-proxy-command:
2+
3+
=====================================
4+
Send Commands to Other Flux Instances
5+
=====================================
6+
7+
It is very common to want to connect to other Flux instances other than the default one. For example, you may want to see the current status of your jobs on a different machine. Or perhaps you've launched a Flux :ref:`subinstances<subinstance>`, and want submit a new job to that subinstance.
8+
9+
This tutorial will introduce the :core:man1:`flux-proxy` command, which can be used to locally connect yourself to another Flux instance. It will then allow you to run commands on that instance.
10+
11+
---------------------------
12+
Starting a Flux Subinstance
13+
---------------------------
14+
15+
To illustrate the ``flux proxy`` command, let's create a Flux subinstance to work with. If you are not familiar with Flux subinstances, subinstances are complete Flux instances run under a subset of resources. They will have their own scheduler and other Flux services completely independent of the parent instance.
16+
17+
A common way to create a subinstance is through ``flux mini alloc``. Lets launch a 4 node subinstance on a cluster called "corona".
18+
19+
.. code-block:: console
20+
21+
corona-login-node:~$ flux mini alloc -N4
22+
corona171:~$ flux resource list
23+
STATE NNODES NCORES NGPUS NODELIST
24+
free 4 192 32 corona[171,173-175]
25+
allocated 0 0 0
26+
down 0 0 0
27+
28+
Notice that we were previously on the node "corona-login-node" and now we are on "corona171". The ``flux mini alloc`` command has dropped us into a shell within our own Flux subinstance. And as you can see, we requested 4 nodes ``-N4`` and ``flux resource list`` shows that our Flux subinstance contains 4 nodes worth of resources.
29+
30+
Lets now submit several ``sleep`` jobs to our subinstance. To make some later output easier to read, lets label these sleep jobs "Level1" with the ``--job-name`` option.
31+
32+
.. code-block:: console
33+
34+
corona171:~$ flux mini submit -n1 --job-name=Level1 sleep inf
35+
ƒUSiZvRm
36+
corona171:~$ flux mini submit -n1 --job-name=Level1 sleep inf
37+
ƒUiCzRCj
38+
corona171:~$ flux jobs
39+
JOBID USER NAME ST NTASKS NNODES TIME INFO
40+
ƒUiCzRCj achu Level1 R 1 1 11.75s corona174
41+
ƒUSiZvRm achu Level1 R 1 1 12.36s corona175
42+
43+
Nothing too special here. We've submitted two jobs to our subinstance and the ``flux jobs`` command shows that we have two jobs running. You'll notice the name of the jobs is "Level1".
44+
45+
Lets open up another window on the corona login node and type ``flux jobs``.
46+
47+
.. code-block:: console
48+
49+
corona-login-node:~$ flux jobs
50+
JOBID USER NAME ST NTASKS NNODES TIME INFO
51+
ƒhBaWbFD1nF achu flux R 4 4 2.492m corona[171,173-175]
52+
53+
The first thing you'll notice is our two "Level1" jobs are missing. They aren't listed here. The reason is that we are now on the parent Flux instance, so it is only showing jobs that were executed under the parent instance. In this case, its only showing our subinstance.
54+
55+
.. note::
56+
57+
Depending on your terminal settings, the subinstance may be colored blue in ``flux jobs``, indicating it is a subinstance.
58+
59+
Where are our two "Level1" jobs? Well, we can show them in ``flux jobs`` via the ``--recursive`` option.
60+
61+
.. code-block:: console
62+
63+
corona-login-node:~$ flux jobs --recursive
64+
JOBID USER NAME ST NTASKS NNODES TIME INFO
65+
ƒhBaWbFD1nF achu flux R 4 4 3.329m corona[171,173-175]
66+
67+
ƒhBaWbFD1nF:
68+
ƒUiCzRCj achu Level1 R 1 1 2.233m corona174
69+
ƒUSiZvRm achu Level1 R 1 1 2.243m corona175
70+
71+
As you can see, ``flux jobs`` has recursively started listing jobs in subinstances, showing our two "Level1" jobs.
72+
73+
Now in this particular example, we happen to have a shell that is connected to our subinstance. However, that may not always be the case (we will show this in an example below). How can we interact with the subinstance if we don't have a shell open? For example, how could we submit additional jobs to our subinstance?
74+
75+
--------------------------
76+
Connect to the Subinstance
77+
--------------------------
78+
79+
The easiest way to operate with a subinstance is to use the ``flux-proxy`` command. It will connect you to another Flux instance, allowing you send commands to it as though you were locally connected.
80+
81+
Lets launch a shell with ``flux proxy`` that will connect us to the subinstance. All we have to do is give ``flux proxy`` the jobid of the subinstance to connect to it. If you don't know the jobid of the subinstance, you can find it via ``flux jobs``.
82+
83+
.. code-block:: console
84+
85+
corona-login-node:~$ flux proxy ƒhBaWbFD1nF
86+
corona-login-node:~$ flux jobs
87+
JOBID USER NAME ST NTASKS NNODES TIME INFO
88+
ƒUiCzRCj achu Level1 R 1 1 3.525m corona174
89+
ƒUSiZvRm achu Level1 R 1 1 3.535m corona175
90+
91+
You now have a local connection to that Flux subinstance and run commands against it. This time the ``flux jobs`` output lists the two sleep jobs that we previously submitted. Notice that the prompt indicates we are still on the the corona login node. You can exit from the subinstance by typing ``exit``.
92+
93+
You can also specify commands for the other Flux instance on the command line. Let us try and submit another job to the subinstance. Again to make display of jobs easier in this example, I'll name the job "Level1A".
94+
95+
.. code-block:: console
96+
97+
corona-login-node:~$ flux proxy ƒhBaWbFD1nF flux mini submit -n1 --job-name=Level1A sleep inf
98+
ƒ4UE5h9qZ
99+
100+
corona-login-node:~$ flux proxy ƒhBaWbFD1nF flux jobs
101+
JOBID USER NAME ST NTASKS NNODES TIME INFO
102+
ƒ3hRvPnXZ achu Level1A R 1 1 8.894s corona173
103+
ƒUiCzRCj achu Level1 R 1 1 5.016m corona174
104+
ƒUSiZvRm achu Level1 R 1 1 5.026m corona175
105+
106+
As you can see, we've successfully submitted another job to the subinstance running a ``flux mini submit`` via ``flux proxy``.
107+
108+
You can also use ssh to proxy to an instance by using the instance's native :ref:`URI<URI>` instead of the jobid. This may be useful if you need to tunnel via ssh to the instance (see :ref:`SSH Across Clusters`<ssh-across-clusters>` for more information).
109+
110+
.. code-block:: console
111+
112+
corona-login-node:~$ flux uri ƒhBaWbFD1nF
113+
ssh://corona171/var/tmp/achu/flux-bsZTZV/local-0
114+
115+
corona-login-node:~$ flux proxy ssh://corona171/var/tmp/achu/flux-bsZTZV/local-0 flux jobs
116+
JOBID USER NAME ST NTASKS NNODES TIME INFO
117+
ƒ3hRvPnXZ achu Level1A R 1 1 1.56m corona173
118+
ƒUiCzRCj achu Level1 R 1 1 6.428m corona174
119+
ƒUSiZvRm achu Level1 R 1 1 6.438m corona175
120+
121+
-------------------------------------
122+
Connect to Subinstance in Subinstance
123+
-------------------------------------
124+
125+
What if you have subinstances inside a subinstance? You could run a ``flux proxy`` inside of another ``flux proxy``. But you can also use ``flux proxy's`` slash shorthand.
126+
127+
Lets create an additional subinstance inside our current Flux instance. We'll create it via the ``flux mini batch`` command. If you are unfamiliar with this command, it is similar to ``flux mini alloc`` except you will not be dropped into a shell.
128+
129+
We will feed this script into the batch command.
130+
131+
.. code-block:: sh
132+
133+
#!/bin/sh
134+
#filename: subinstance-jobs.sh
135+
136+
jobid1=`flux mini submit -n1 --job-name=Level2 sleep inf`
137+
jobid2=`flux mini submit -n1 --job-name=Level2 sleep inf`
138+
flux job status ${jobid1} ${jobid2}
139+
140+
As you can see, all this script does is launch several ``sleep`` jobs and then wait for the jobs to complete via ``flux job status``. To make some output easier for later, we've named these jobs "Level2".
141+
142+
Lets launch this in the subinstance using ``flux proxy``.
143+
144+
.. code-block:: console
145+
146+
corona-login-node:~$ flux proxy ƒhBaWbFD1nF flux mini batch -n4 ./subinstance-jobs.sh
147+
ƒ4xzRvx87
148+
149+
corona-login-node:~$ flux proxy ƒhBaWbFD1nF flux jobs
150+
JOBID USER NAME ST NTASKS NNODES TIME INFO
151+
ƒ4xzRvx87 achu subinstan+ R 4 1 35.62s corona171
152+
ƒ3hRvPnXZ achu Level1A R 1 1 3.376m corona173
153+
ƒUiCzRCj achu Level1 R 1 1 8.243m corona174
154+
ƒUSiZvRm achu Level1 R 1 1 8.254m corona175
155+
156+
As you can see, the new sleep jobs are not listed in the proxy ``flux jobs`` output. They are within the new Flux subinstance we just created, which you can see in the above as jobid ``ƒ4xzRvx87`` . I can prove this to you by using ``flux jobs --recursive``.
157+
158+
.. code-block:: console
159+
160+
corona-login-node:~$ flux jobs --recursive
161+
JOBID USER NAME ST NTASKS NNODES TIME INFO
162+
ƒhBaWbFD1nF achu flux R 4 4 10.21m corona[171,173-175]
163+
164+
ƒhBaWbFD1nF:
165+
ƒ4xzRvx87 achu subinstan+ R 4 1 1.466m corona171
166+
ƒ3hRvPnXZ achu Level1A R 1 1 4.248m corona173
167+
ƒUiCzRCj achu Level1 R 1 1 9.116m corona174
168+
ƒUSiZvRm achu Level1 R 1 1 9.126m corona175
169+
170+
ƒhBaWbFD1nF/ƒ4xzRvx87:
171+
ƒdAoUkw achu Level2 R 1 1 1.426m corona171
172+
ƒYfsfAF achu Level2 R 1 1 1.429m corona171
173+
174+
With this output you can see all the jobs we've submitted. We have our original subinstance ("ƒhBaWbFD1nF"), the 3 "Level1" sleep jobs in the first subinstance, the new subinstance within a subinstance ("ƒ4xzRvx87") and our new "Level2" sleep jobs.
175+
176+
So how could we connect to this new subinstance within a subinstance?
177+
178+
To get the job ids of one of the subinstances we can run a ``flux proxy`` inside of another ``flux proxy``.
179+
180+
.. code-block:: console
181+
182+
corona-login-node:~$ flux proxy ƒhBaWbFD1nF flux proxy ƒ4xzRvx87 flux jobs
183+
JOBID USER NAME ST NTASKS NNODES TIME INFO
184+
ƒdAoUkw achu Level2 R 1 1 1.97m corona171
185+
ƒYfsfAF achu Level2 R 1 1 1.973m corona171
186+
187+
Or we can use the special shorthand which separates job ids by a slash.
188+
189+
.. code-block:: console
190+
191+
corona-login-node:~$ flux proxy ƒhBaWbFD1nF/ƒ4xzRvx87 flux jobs
192+
JOBID USER NAME ST NTASKS NNODES TIME INFO
193+
ƒdAoUkw achu Level2 R 1 1 2.336m corona171
194+
ƒYfsfAF achu Level2 R 1 1 2.339m corona171
195+
196+
Each of these allows us to connect to the new inner subinstance and interact with it.
197+
198+
-------------------------
199+
Proxy to Flux Under Slurm
200+
-------------------------
201+
202+
A special slurm proxy resolver is also available if you launch Flux under Slurm. Lets launch a Flux instance under Slurm on another cluster.
203+
204+
.. code-block:: console
205+
206+
$ srun -N4 --pty flux start
207+
208+
From another window lets get the job id of this Slurm job. I'll get it via ``squeue``:
209+
210+
.. code-block:: console
211+
212+
$ squeue
213+
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
214+
321104 pbatch flux achu R 0:28 4 opal[63-66]
215+
216+
We can proxy to the Flux instance via the ``slurm`` prefix, indicating this is a Slurm jobid.
217+
218+
As an exmple, I'll submit a job to the Flux instance via ``flux mini submit`` then we see can see the jobs with ``flux jobs``.
219+
220+
.. code-block:: console
221+
222+
$ flux proxy slurm:321104 flux mini submit sleep 60
223+
fqyZGwh1
224+
225+
$ flux proxy slurm:321104 flux jobs
226+
JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST
227+
fqyZGwh1 achu sleep R 1 1 5.605s opal66
228+
229+
If you have any questions, please `let us know <https://github.com/flux-framework/flux-docs/issues>`_.

tutorials/commands/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Welcome to the Command Tutorials! These tutorials should help you to map specifi
77
with your use case, and then see detailed usage.
88

99
- ``flux submit/flux run`` (:ref:`flux-submit`): "Submit a job in a Flux instance"
10+
- ``flux proxy`` (:ref:`flux-proxy-command`): "Send commands to other Flux instances"
1011
- ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a Flux instance across clusters using ssh"
1112

1213
This section is currently 🚧️ under construction 🚧️, so please come back later to see more command tutorials!
@@ -17,4 +18,5 @@ This section is currently 🚧️ under construction 🚧️, so please come bac
1718
:caption: Command Tutorials
1819

1920
flux-submit
21+
flux-proxy-command
2022
ssh-across-clusters

0 commit comments

Comments
 (0)