You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-9Lines changed: 7 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
-
# dask-profiling
1
+
# dask-pyspy
2
2
3
3
Profile dask [distributed](https://github.com/dask/distributed) clusters with [py-spy](https://github.com/benfred/py-spy).
4
4
5
5
```python
6
6
import dask
7
7
import distributed
8
8
9
-
fromdask_profilingimport pyspy
9
+
fromdask_pyspyimport pyspy
10
10
11
11
client = distributed.Client()
12
12
@@ -25,23 +25,21 @@ Using `pyspy` or `pyspy_on_scheduler` attaches a profiler to the Python process,
25
25
26
26
By default, py-spy profiles are recorded in [speedscope](https://www.speedscope.app/) format.
27
27
28
-
`dask-profiling` (and, transitively, `py-spy`) must be installed in the environment where the scheduler is running.
28
+
`dask-pyspy` (and, transitively, `py-spy`) must be installed in the environment where the scheduler is running.
29
29
30
-
`dask-profiling` tries hard to work out-of-the-box, but if your cluster is running inside Docker, or on macOS, you'll need to configure things so it's allowed to run. See the [privileges for py-spy](#privileges-for-py-spy) section.
30
+
`dask-pyspy` tries hard to work out-of-the-box, but if your cluster is running inside Docker, or on macOS, you'll need to configure things so it's allowed to run. See the [privileges for py-spy](#privileges-for-py-spy) section.
The `pyspy` and `pyspy_on_scheduler` functions are context managers. Entering them starts py-spy on the workers / scheduler. Exiting them stops py-spy, sends the profile data back to the client, and writes it to disk.
41
41
42
42
```python
43
-
client = distributed.Client()
44
-
45
43
with pyspy_on_scheduler("scheduler-profile.json"):
46
44
# Profile the scheduler.
47
45
# Writes to the `scheduler-profile.json` file locally.
@@ -127,13 +125,13 @@ del persisted
127
125
128
126
You may need to run the dask process as root for py-spy to be able to profile it (especially on macOS). See https://github.com/benfred/py-spy#when-do-you-need-to-run-as-sudo.
129
127
130
-
In a Docker container, `dask-profiling` will "just work" for Docker/moby versions >= 21.xx. As of right now (Nov 2022), Docker 21.xx doesn't exist yet, so read on.
128
+
In a Docker container, `dask-pyspy` will "just work" for Docker/moby versions >= 21.xx. As of right now (Nov 2022), Docker 21.xx doesn't exist yet, so read on.
131
129
132
130
[moby/moby#42083](https://github.com/moby/moby/pull/42083/files) allowlisted by default the `process_vm_readv` system call that py-spy uses, which used to be blocked unless you set `--cap-add SYS_PTRACE`. Allowing this specific system call in unprivileged containers has been safe to do for a while (since linux kernel versions > 4.8), but just wasn't enabled in Docker. So your options right now are:
133
131
* (low/no security impact) Download the newer [`seccomp.json`](https://github.com/moby/moby/blob/d39b075302c27f77b2de413697a5aacb034d8286/profiles/seccomp/default.json) file from moby/master and pass it to Docker via `--seccomp=default.json`.
134
132
* (more convenient) Pass `--cap-add SYS_PTRACE` to Docker. This enables more than you need, but it's one less step.
135
133
136
-
On Ubuntu-based containers, ptrace system calls are [further blocked](https://www.kernel.org/doc/Documentation/admin-guide/LSM/Yama.rst): processes are prohibited from ptracing each other even within the same UID. To work around this, `dask-profiling` automatically uses [`prctl(2)`](https://man7.org/linux/man-pages/man2/prctl.2.html) to mark the scheduler process as ptrace-able by itself and any child processes, then launches py-spy as a child process.
134
+
On Ubuntu-based containers, ptrace system calls are [further blocked](https://www.kernel.org/doc/Documentation/admin-guide/LSM/Yama.rst): processes are prohibited from ptracing each other even within the same UID. To work around this, `dask-pyspy` automatically uses [`prctl(2)`](https://man7.org/linux/man-pages/man2/prctl.2.html) to mark the scheduler process as ptrace-able by itself and any child processes, then launches py-spy as a child process.
0 commit comments