|
| 1 | +.. _flux-cancel: |
| 2 | +.. _flux-pkill: |
| 3 | + |
| 4 | +======================== |
| 5 | +How to Cancel a Flux Job |
| 6 | +======================== |
| 7 | + |
| 8 | +Inevitably submitted jobs will have to be canceled for one reason or another. This tutorial |
| 9 | +will show you how. |
| 10 | + |
| 11 | +---------------------------- |
| 12 | +How to Cancel a Job by Jobid |
| 13 | +---------------------------- |
| 14 | + |
| 15 | +The basic way to cancel a job is through ``flux cancel``. All you have to do is specify |
| 16 | +the jobid on the command line. Here is a simple example after submitting a job. |
| 17 | + |
| 18 | +.. code-block:: console |
| 19 | +
|
| 20 | + $ flux submit sleep 100 |
| 21 | + ƒh35Dh5qRyq |
| 22 | +
|
| 23 | + $ flux jobs ƒh35Dh5qRyq |
| 24 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 25 | + ƒh35Dh5qRyq achu sleep R 1 1 13.33s corona174 |
| 26 | +
|
| 27 | + $ flux cancel ƒh35Dh5qRyq |
| 28 | +
|
| 29 | + <snip wait a little bit> |
| 30 | +
|
| 31 | + $ flux jobs ƒh35Dh5qRyq |
| 32 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 33 | + ƒh35Dh5qRyq achu sleep CA 1 1 20.18s corona174 |
| 34 | +
|
| 35 | +In the above example we submitted a simple job via ``flux submit`` that simply |
| 36 | +runs ``sleep``. Passing the resulting jobid to ``flux jobs`` shows that it is |
| 37 | +running (state is ``R``). |
| 38 | + |
| 39 | +We cancel the job simply by passing the jobid to ``flux cancel``. After waiting |
| 40 | +a little bit, we see that the job is now canceled in ``flux jobs`` (state is ``CA``). |
| 41 | + |
| 42 | +While we only passed one jobid to ``flux cancel`` in this example, multiple jobids can be |
| 43 | +passed on the commandline to cancel many jobs. |
| 44 | + |
| 45 | +Note that in this particular example we happened to know the jobid of our job. If you do |
| 46 | +not know the the jobid of your job, you can always use ``flux jobs`` to see a list of all |
| 47 | +your currently active jobs. |
| 48 | + |
| 49 | +.. note:: |
| 50 | + |
| 51 | + Optionally a message can be added to the cancellation through the ``-m`` option. For example: |
| 52 | + |
| 53 | + .. code-block:: console |
| 54 | +
|
| 55 | + $ flux cancel -m "I ran the wrong command" f3vnSzaaB |
| 56 | +
|
| 57 | + This may be useful for later knowing why a job was canceled. You can see the message using |
| 58 | + ``flux jobs`` and the ``endreason`` format. For example: |
| 59 | + |
| 60 | + .. code-block:: console |
| 61 | +
|
| 62 | + $ flux jobs --format=endreason f3vnSzaaB |
| 63 | + JOBID USER NAME ST T_INACTIVE INACTIVE-REASON |
| 64 | + f3vnSzaaB achu sleep CA May11 08:52 Canceled: I ran the wrong command |
| 65 | +
|
| 66 | +--------------------------- |
| 67 | +Canceling Many of Your Jobs |
| 68 | +--------------------------- |
| 69 | + |
| 70 | +When you need to cancel many or all of your jobs, you can use either the ``--all`` option with ``flux cancel`` |
| 71 | +or the ``flux pkill`` command. Lets run through several examples with the ``--all`` option first. |
| 72 | + |
| 73 | +``flux cancel --all`` allows you to cancel jobs without specifying jobids. By default it cancels all of your active |
| 74 | +jobs, but several options allow you to target a subset of the jobs. |
| 75 | + |
| 76 | +To start off, lets create 100 jobs that will sleep infinitely. We will use the special ``--cc`` (carbon copy) |
| 77 | +option to ``flux submit`` that will submit 100 duplicate copies of the ``sleep`` job. |
| 78 | + |
| 79 | +.. code-block:: console |
| 80 | +
|
| 81 | + $ flux submit --cc=1-100 sleep inf |
| 82 | + <snip - many job ids printed out> |
| 83 | +
|
| 84 | + $ flux jobs |
| 85 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 86 | + ƒjTWS5m3 achu sleep S 1 - - |
| 87 | + ƒjTWS5m4 achu sleep S 1 - - |
| 88 | + ƒjTWS5m5 achu sleep S 1 - - |
| 89 | + ƒjTWS5m6 achu sleep S 1 - - |
| 90 | + <snip - there are many jobs waiting to be run> |
| 91 | + ƒjTWS5m2 achu sleep R 1 1 8.858s corona212 |
| 92 | + ƒjTWS5m1 achu sleep R 1 1 8.860s corona212 |
| 93 | + ƒjTUx6Um achu sleep R 1 1 8.870s corona212 |
| 94 | + ƒjTUx6Uk achu sleep R 1 1 8.870s corona212 |
| 95 | + ƒjTUx6Uj achu sleep R 1 1 8.870s corona212 |
| 96 | + ƒjTUx6Ui achu sleep R 1 1 8.871s corona212 |
| 97 | + <snip - there are many jobs running> |
| 98 | +
|
| 99 | +As you can see, we have a lot of jobs waiting to run (state ``S``) and a lot of running jobs (state ``R``). |
| 100 | + |
| 101 | +Lets first ``flux cancel --all`` without any options. |
| 102 | + |
| 103 | +.. code-block:: console |
| 104 | +
|
| 105 | + $ flux cancel --all |
| 106 | + flux-cancel: Canceled 100 jobs (0 errors) |
| 107 | +
|
| 108 | + $ flux jobs |
| 109 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 110 | +
|
| 111 | +As you can see, all the jobs are now canceled. ``flux jobs`` |
| 112 | +confirms there are no longer any of our jobs running or waiting to run. |
| 113 | + |
| 114 | +There are several options to filter the jobs to cancel when using the ``--all`` option. Perhaps the most commonly used |
| 115 | +option is the ``-S`` or ``--states`` option. The ``--states`` option specifies the state(s) of a job to cancel. The most |
| 116 | +common states to target are ``pending`` and ``running``. Lets resubmit our 100 jobs and see the result |
| 117 | +of trying to cancel ``pending`` vs ``running`` jobs. |
| 118 | + |
| 119 | +.. code-block:: console |
| 120 | +
|
| 121 | + $ flux submit --cc=1-100 sleep inf |
| 122 | + <snip - many job ids printed out> |
| 123 | +
|
| 124 | + $ flux cancel --all --states=pending |
| 125 | + flux-cancel: Canceled 52 jobs (0 errors) |
| 126 | +
|
| 127 | + $ flux cancel --all --states=running |
| 128 | + flux-cancel: Canceled 48 jobs (0 errors) |
| 129 | +
|
| 130 | +As you can see ``flux cancel --all --states=pending`` targeted the 52 pending jobs for cancellation and |
| 131 | +``flux cancel --all --states=running`` targeted the current 48 running jobs for cancellation. |
| 132 | + |
| 133 | +-------------------------- |
| 134 | +Cancelling with Flux Pkill |
| 135 | +-------------------------- |
| 136 | + |
| 137 | +The final way to cancel a job is via ``flux pkill``. There are a number of search and filtering options available in |
| 138 | +``flux pkill`` which can be seen in the :core:man1:`flux-pkill` manpage. |
| 139 | + |
| 140 | +However, there are two common ways ``flux pkill`` is used. The first is to cancel a range of jobids. The jobid range can be specified |
| 141 | +via the format ``jobid1..jobidN``. |
| 142 | + |
| 143 | +It is best shown with an example. |
| 144 | + |
| 145 | +.. code-block:: console |
| 146 | +
|
| 147 | + $ flux submit --cc=1-5 sleep inf |
| 148 | + ƒ3vEobuhH |
| 149 | + ƒ3vEobuhJ |
| 150 | + ƒ3vEobuhK |
| 151 | + ƒ3vEq5tyd |
| 152 | + ƒ3vEq5tye |
| 153 | +
|
| 154 | + $ flux jobs |
| 155 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 156 | + ƒ3vEq5tye achu sleep R 1 1 14.23s corona212 |
| 157 | + ƒ3vEq5tyd achu sleep R 1 1 14.23s corona212 |
| 158 | + ƒ3vEobuhK achu sleep R 1 1 14.23s corona212 |
| 159 | + ƒ3vEobuhJ achu sleep R 1 1 14.23s corona212 |
| 160 | + ƒ3vEobuhH achu sleep R 1 1 14.23s corona212 |
| 161 | +
|
| 162 | +Similar to before, we've submitted some sleep jobs. We see all five of the sleep jobs are |
| 163 | +running (state ``R``) in the ``flux jobs`` output. |
| 164 | + |
| 165 | +We can inform ``flux pkill`` to cancel the set of 5 jobs by specifying the first and last jobid of this range. |
| 166 | + |
| 167 | +.. code-block:: console |
| 168 | +
|
| 169 | + $ flux pkill ƒ3vEobuhH..ƒ3vEq5tye |
| 170 | + flux-pkill: INFO: Canceled 5 jobs |
| 171 | +
|
| 172 | + $ flux jobs |
| 173 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 174 | +
|
| 175 | +As you can see ``flux pkill`` canceled the five jobs in the range. |
| 176 | + |
| 177 | +The other common way to ``flux pkill`` is used is to cancel jobs with matching job names. For example, you may |
| 178 | +submit several different types of jobs and give them different types of names to describe their function. ``flux pkill`` |
| 179 | +can be used to match on the job names and cancel only the ones that match. |
| 180 | + |
| 181 | +Lets submit several jobs and give them specific names using the ``--job-name`` option. |
| 182 | + |
| 183 | +.. code-block:: console |
| 184 | +
|
| 185 | + $ flux submit --job-name=foo sleep inf |
| 186 | + ƒ6KjHNcxP |
| 187 | +
|
| 188 | + $ flux submit --job-name=foobar sleep inf |
| 189 | + ƒ6Limcmju |
| 190 | +
|
| 191 | + $ flux submit --job-name=boo sleep inf |
| 192 | + ƒ6NCaXCmV |
| 193 | +
|
| 194 | + $ flux submit --job-name=baz sleep inf |
| 195 | + ƒ6PjZG6jq |
| 196 | +
|
| 197 | + $ flux jobs |
| 198 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 199 | + ƒ6PjZG6jq achu baz R 1 1 38.06s corona212 |
| 200 | + ƒ6NCaXCmV achu boo R 1 1 41.54s corona212 |
| 201 | + ƒ6Limcmju achu foobar R 1 1 44.9s corona212 |
| 202 | + ƒ6KjHNcxP achu foo R 1 1 47.15s corona212 |
| 203 | +
|
| 204 | +
|
| 205 | +We've submitted four jobs, giving them the job names "foo", "foobar", "boo", and "baz". |
| 206 | + |
| 207 | +Lets cancel the job "boo" via ``flux pkill`` |
| 208 | + |
| 209 | +.. code-block:: console |
| 210 | +
|
| 211 | + $ flux pkill boo |
| 212 | + flux-pkill: INFO: Canceled 1 job |
| 213 | +
|
| 214 | + $ flux jobs |
| 215 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 216 | + ƒ6PjZG6jq achu baz R 1 1 2.856m corona212 |
| 217 | + ƒ6Limcmju achu foobar R 1 1 2.97m corona212 |
| 218 | + ƒ6KjHNcxP achu foo R 1 1 3.008m corona212 |
| 219 | +
|
| 220 | +As you can see, ``flux pkill`` canceled just one job, the one assigned the name "boo". |
| 221 | + |
| 222 | +``flux pkill`` will actually search for all jobs matching the supplied name, so what would happen if we asked ``flux pkill`` |
| 223 | +to cancel jobs with the matching name "foo". |
| 224 | + |
| 225 | +.. code-block:: console |
| 226 | +
|
| 227 | + $ flux pkill foo |
| 228 | + flux-pkill: INFO: Canceled 2 jobs |
| 229 | +
|
| 230 | + $ flux jobs |
| 231 | + JOBID USER NAME ST NTASKS NNODES TIME INFO |
| 232 | + ƒ6PjZG6jq achu baz R 1 1 4.626m corona212 |
| 233 | +
|
| 234 | +As you can see it didn't cancel 1 job, it canceled 2 jobs, the job "foo" and the job "foobar". |
| 235 | + |
| 236 | +And that's it! If you have any questions, please |
| 237 | +`let us know <https://github.com/flux-framework/flux-docs/issues>`_. |
0 commit comments