-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-128942: make arraymodule.c free-thread safe (lock-free) #130771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ping @colesbury |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Disclaimer: I'm not an expert on the FT list implementation, so take some of my comments with a grain of salt.
Seeing good single-threaded performance is nice, but what about multi-threaded scaling? The number of locks that are still here scare me a little--it would be nice if this scaled well for concurrent use as well, especially for operations that don't require concurrent writes (e.g., comparisons and copies).
Note, this is not ready to go, there is the memory issue which needs resolving. |
@ZeroIntensity you can remove the do-not-merge, its not an |
The main thing here for acceptance is a benchmark run which I am not able to start (I only did local pyperformance check against main), so someone with access will have to initiate that to compare with main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gotten a chance to look through arraymodule.c
yet. I'll review that later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach here seems good. A few comments below.
The actual
Are there any other places where this needs to take place? Its the test and trying to run it with
Which is not Left the bad |
I'd like
Yes, |
Have a look at the possible
IMHO I think you are overly worried about that phrase "undefined behavior" which in practice really just means "undefined (stale) value", otherwise probably no modern system would boot. Also I have never had a problem writing an In any case changed back to non-atomic get/set. Also removed tsan-specific test. |
Small detail, you want to leave or remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor comments below, but I think the alignment of items
is important.
And:
for |
Use something like: https://gist.github.com/colesbury/96f27e2ddf6b151adeeb4c28ed7554d8 The alignment specifier has to be on |
Just a minor nit "MS_WINDOWS" or "_MSC_VER"? The latter is used in the codebase for MSVC-specific directives, and you can run gcc under Windows. |
|
Excessive QSBR memory usage: I ran across this while profiling memory usage here. The results are similar for both array and list objects which use QSBR to free memory, so this is a QSBR thing. Memory usage numbers for script (provided below) using both array and list (
Script: from queue import Queue
def thrdfunc(queue):
while True:
l = queue.get()
l.append(0) # force resize in non-parent thread which will free using _PyMem_ProcessDelayed()
queue = Queue(maxsize=2)
threading.Thread(target=thrdfunc, args=(queue,)).start()
while True:
l = array('i', [0] * int(3840*2160*3/4)) # using int instead of byte for reasons
# l = [None] * int(3840*2160*3/8) # sys.getsizeof(l) ~= 3840*2160*3 bytes
queue.put(l) Since delayed memory free checks (and subsequent frees if applicable) occur in one of two situations:
This works great for many small objects, but with larger buffers these can accumulate quickly. I tried a few things but the diff --git a/Python/pystate.c b/Python/pystate.c
index ee35f0fa945..d9d731a15bc 100644
--- a/Python/pystate.c
+++ b/Python/pystate.c
@@ -2169,6 +2169,9 @@ _PyThreadState_Attach(PyThreadState *tstate)
#if defined(Py_DEBUG)
errno = err;
#endif
+#ifdef Py_GIL_DISABLED
+ _PyMem_ProcessDelayed(tstate);
+#endif
}
static void Not saying it is THE solution, but at the very least it shows memory usage can be reduced with no hit to performance (timed that with Another option would be to add another check in Thoughts? |
I added lock-free single element reads and writes by mostly copying the
list
object's homework. TL;DR: pyperformance scimark seems to be back to about what it was without the free-thread safe stuff (pending confirmation of course). Tried a few other things but the list strategy seems good enough (except for the negative index thing I mentioned in #130744, if that is an issue).Timings, the relevant ones are "OLD" - non free-thread safe arraymodule, "SLOW" - the previous slower PR and the last two "LFREERW".
array
module is not free-thread safe. #128942