Skip to content

Commit caeb736

Browse files
author
Al Chu11
committed
tutorials: Add a flux job cancel tutorial
1 parent 7412532 commit caeb736

File tree

4 files changed

+228
-0
lines changed

4 files changed

+228
-0
lines changed
0 Bytes
Binary file not shown.
0 Bytes
Binary file not shown.
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
.. _flux-job-cancel:
2+
.. _flux-job-cancelall:
3+
.. _flux-pkill:
4+
5+
========================
6+
How to Cancel a Flux Job
7+
========================
8+
9+
Inevitably submitted jobs will have to be canceled for one reason or another. This tutorial
10+
will show you how.
11+
12+
----------------------------
13+
How to Cancel a Job by Jobid
14+
----------------------------
15+
16+
The basic way to cancel a job is through ``flux job cancel``. All you have to do is specify
17+
the jobid on the command line. Here is a simple example after submitting a job.
18+
19+
.. code-block:: console
20+
21+
$ flux mini submit sleep 100
22+
ƒh35Dh5qRyq
23+
24+
$ flux jobs ƒh35Dh5qRyq
25+
JOBID USER NAME ST NTASKS NNODES TIME INFO
26+
ƒh35Dh5qRyq achu sleep R 1 1 13.33s corona174
27+
28+
$ flux job cancel ƒh35Dh5qRyq
29+
30+
<snip wait a little bit>
31+
32+
$ flux jobs ƒh35Dh5qRyq
33+
JOBID USER NAME ST NTASKS NNODES TIME INFO
34+
ƒh35Dh5qRyq achu sleep CA 1 1 20.18s corona174
35+
36+
In the above example we submitted a simple job via ``flux mini submit`` that simply
37+
runs ``sleep``. Passing the resulting jobid to ``flux jobs`` shows that it is
38+
running (state is ``R``).
39+
40+
We cancel the job simply by passing the jobid to ``flux job cancel``. After waiting
41+
a little bit, we see that the job is now canceled in ``flux jobs`` (state is ``CA``).
42+
43+
While we only passed one jobid to ``flux job cancel`` in this example, multiple jobids can be
44+
passed on the commandline to cancel many jobs.
45+
46+
Note that in this particular example we happened to know the jobid of our job. If you do
47+
not know the the jobid of your job, you can always use ``flux jobs`` to see a list of all
48+
your currently active jobs.
49+
50+
------------------------
51+
Cancelling All Your Jobs
52+
------------------------
53+
54+
The ``flux job cancelall`` command allows you to cancel jobs without specifying jobids.
55+
By default it cancels all of your active jobs, but several options allow you to target a subset of the jobs.
56+
57+
To start off, lets create 100 jobs that will sleep infinitely. We will use the special ``--cc`` (carbon copy)
58+
option to ``flux mini submit`` that will submit 100 duplicate copies of the ``sleep`` job.
59+
60+
.. code-block:: console
61+
62+
$ flux mini submit --cc=1-100 sleep inf
63+
<snip - many job ids printed out>
64+
65+
$ flux jobs
66+
JOBID USER NAME ST NTASKS NNODES TIME INFO
67+
ƒjTWS5m3 achu sleep S 1 - -
68+
ƒjTWS5m4 achu sleep S 1 - -
69+
ƒjTWS5m5 achu sleep S 1 - -
70+
ƒjTWS5m6 achu sleep S 1 - -
71+
<snip - there are many jobs waiting to be run>
72+
ƒjTWS5m2 achu sleep R 1 1 8.858s corona212
73+
ƒjTWS5m1 achu sleep R 1 1 8.860s corona212
74+
ƒjTUx6Um achu sleep R 1 1 8.870s corona212
75+
ƒjTUx6Uk achu sleep R 1 1 8.870s corona212
76+
ƒjTUx6Uj achu sleep R 1 1 8.870s corona212
77+
ƒjTUx6Ui achu sleep R 1 1 8.871s corona212
78+
<snip - there are many jobs running>
79+
80+
As you can see, we have a lot of jobs waiting to run (state ``S``) and a lot of running jobs (state ``R``).
81+
82+
Lets first ``flux job cancelall`` without any options.
83+
84+
.. code-block:: console
85+
86+
$ flux job cancelall
87+
flux-job: Command matched 100 jobs (-f to confirm)
88+
89+
As you can see, ``flux job cancelall`` found all 100 jobs to cancel, but it hasn't canceled them yet. In order to go through
90+
with the cancellation you must specify the ``-f`` (or ``--force``) option.
91+
92+
.. code-block:: console
93+
94+
$ flux job cancelall -f
95+
flux-job: Canceled 100 jobs (0 errors)
96+
97+
$ flux jobs
98+
JOBID USER NAME ST NTASKS NNODES TIME INFO
99+
100+
As you can see, all the jobs are now canceled after passing the ``-f`` option to ``flux job cancelall``. ``flux jobs``
101+
confirms there are no longer any of our jobs running or waiting to run.
102+
103+
``flux job cancellall`` has several options to filter the jobs to cancel. Perhaps the most commonly used
104+
option is the ``-S`` or ``--states`` option. The ``--states`` option specifies the state(s) of a job to cancel. The most
105+
common states to target are ``pending`` and ``running``. Lets resubmit our 100 jobs and see the result
106+
of trying to cancel ``pending`` vs ``running`` jobs.
107+
108+
.. code-block:: console
109+
110+
$ flux mini submit --cc=1-100 sleep inf
111+
<snip - many job ids printed out>
112+
113+
$ flux job cancelall --states=pending
114+
flux-job: Command matched 52 jobs (-f to confirm)
115+
116+
$ flux job cancelall --states=running
117+
flux-job: Command matched 48 jobs (-f to confirm)
118+
119+
As you can see ``flux job cancelall --states=pending`` would target the 52 pending jobs for cancellation and
120+
``flux job cancelall --states=running`` would target the current 48 running jobs for cancellation.
121+
122+
--------------------------
123+
Cancelling with Flux Pkill
124+
--------------------------
125+
126+
One final way to cancel a job is via ``flux pkill``. There are a number of search and filtering options available in
127+
``flux pkill`` which can be seen in the :core:man1:`flux-pkill` manpage.
128+
129+
However, there are two common ways ``flux pkill`` is used. The first is to cancel a range of jobids. The jobid range can be specified
130+
via the format ``jobid1..jobidN``.
131+
132+
It is best shown with an example.
133+
134+
.. code-block:: console
135+
136+
$ flux mini submit --cc=1-5 sleep inf
137+
ƒ3vEobuhH
138+
ƒ3vEobuhJ
139+
ƒ3vEobuhK
140+
ƒ3vEq5tyd
141+
ƒ3vEq5tye
142+
143+
$ flux jobs
144+
JOBID USER NAME ST NTASKS NNODES TIME INFO
145+
ƒ3vEq5tye achu sleep R 1 1 14.23s corona212
146+
ƒ3vEq5tyd achu sleep R 1 1 14.23s corona212
147+
ƒ3vEobuhK achu sleep R 1 1 14.23s corona212
148+
ƒ3vEobuhJ achu sleep R 1 1 14.23s corona212
149+
ƒ3vEobuhH achu sleep R 1 1 14.23s corona212
150+
151+
Similar to before, we've submitted some sleep jobs. We see all five of the sleep jobs are
152+
running (state ``R``) in the ``flux jobs`` output.
153+
154+
We can inform ``flux pkill`` to cancel the set of 5 jobs by specifying the first and last jobid of this range.
155+
156+
.. code-block:: console
157+
158+
$ flux pkill ƒ3vEobuhH..ƒ3vEq5tye
159+
flux-pkill: INFO: Canceled 5 jobs
160+
161+
$ flux jobs
162+
JOBID USER NAME ST NTASKS NNODES TIME INFO
163+
164+
As you can see ``flux pkill`` canceled the five jobs in the range.
165+
166+
The other common way to ``flux pkill`` is used is to cancel jobs with matching job names. For example, you may
167+
submit several different types of jobs and give them different types of names to describe their function. ``flux pkill``
168+
can be used to match on the job names and cancel only the ones that match.
169+
170+
Lets submit several jobs and give them specific names using the ``--job-name`` option.
171+
172+
.. code-block:: console
173+
174+
$ flux mini submit --job-name=foo sleep inf
175+
ƒ6KjHNcxP
176+
177+
$ flux mini submit --job-name=foobar sleep inf
178+
ƒ6Limcmju
179+
180+
$ flux mini submit --job-name=boo sleep inf
181+
ƒ6NCaXCmV
182+
183+
$ flux mini submit --job-name=baz sleep inf
184+
ƒ6PjZG6jq
185+
186+
$ flux jobs
187+
JOBID USER NAME ST NTASKS NNODES TIME INFO
188+
ƒ6PjZG6jq achu baz R 1 1 38.06s corona212
189+
ƒ6NCaXCmV achu boo R 1 1 41.54s corona212
190+
ƒ6Limcmju achu foobar R 1 1 44.9s corona212
191+
ƒ6KjHNcxP achu foo R 1 1 47.15s corona212
192+
193+
194+
We've submitted four jobs, giving them the job names "foo", "foobar", "boo", and "baz".
195+
196+
Lets cancel the job "boo" via ``flux pkill``
197+
198+
.. code-block:: console
199+
200+
$ flux pkill boo
201+
flux-pkill: INFO: Canceled 1 job
202+
203+
$ flux jobs
204+
JOBID USER NAME ST NTASKS NNODES TIME INFO
205+
ƒ6PjZG6jq achu baz R 1 1 2.856m corona212
206+
ƒ6Limcmju achu foobar R 1 1 2.97m corona212
207+
ƒ6KjHNcxP achu foo R 1 1 3.008m corona212
208+
209+
As you can see, ``flux pkill`` canceled just one job, the one assigned the name "boo".
210+
211+
``flux pkill`` will actually search for all jobs matching the supplied name, so what would happen if we asked ``flux pkill``
212+
to cancel jobs with the matching name "foo".
213+
214+
.. code-block:: console
215+
216+
$ flux pkill foo
217+
flux-pkill: INFO: Canceled 2 jobs
218+
219+
$ flux jobs
220+
JOBID USER NAME ST NTASKS NNODES TIME INFO
221+
ƒ6PjZG6jq achu baz R 1 1 4.626m corona212
222+
223+
As you can see it didn't cancel 1 job, it canceled 2 jobs, the job "foo" and the job "foobar".
224+
225+
And that's it! If you have any questions, please
226+
`let us know <https://github.com/flux-framework/flux-docs/issues>`_.

tutorials/commands/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Welcome to the Command Tutorials! These tutorials should help you to map specifi
77
with your use case, and then see detailed usage.
88

99
- ``flux mini submit/flux mini run`` (:ref:`flux-mini-submit`): "Submit a job in a Flux instance"
10+
- ``flux job cancel/flux job cancelall/flux pkill`` (:ref:`flux-job-cancel`): "Cancel a job you submitted"
1011
- ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a Flux instance across clusters using ssh"
1112

1213
This section is currently 🚧️ under construction 🚧️, so please come back later to see more command tutorials!
@@ -17,4 +18,5 @@ This section is currently 🚧️ under construction 🚧️, so please come bac
1718
:caption: Command Tutorials
1819

1920
flux-mini-submit
21+
flux-job-cancel
2022
ssh-across-clusters

0 commit comments

Comments
 (0)