-
Notifications
You must be signed in to change notification settings - Fork 2.3k
SSH OpenCL format: synchronize with CPU format #5747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is this issue specific to this format at all? Maybe a change is needed in the shared OpenCL host code? |
I prefer to avoid invasive changes. On the other hand, why would anyone on Earth need to set LWS=1024 for a CPU? |
The other way to go is: From e46341d54a42e325f6a16b810403a8c23826c7b0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Claudio=20Andr=C3=A9?= <[email protected]>
Date: Mon, 31 Mar 2025 09:01:47 -0300
Subject: [PATCH] OpenCL autotune: limit LWS up to 256 on CPU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
I've seen many segmentation faults in the SSH OpenCL format when it
reaches 1024.
Signed-off-by: Claudio André <[email protected]>
---
src/opencl_autotune.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/opencl_autotune.c b/src/opencl_autotune.c
index 47f1b421e..5553e36ba 100644
--- a/src/opencl_autotune.c
+++ b/src/opencl_autotune.c
@@ -59,6 +59,9 @@ size_t autotune_get_task_max_work_group_size(int use_local_memory,
else
max_available = get_device_max_lws(gpu_id);
+ if (cpu(device_info[gpu_id]) && (max_available > 256))
+ max_available = 256;
+
if (max_available > get_kernel_max_lws(gpu_id, crypt_kernel))
return get_kernel_max_lws(gpu_id, crypt_kernel);
--
2.43.0 I don't see any reason why, for example, one should use LWS > 128 on a CPU. But let's listen to magnum's wise words. |
This is really weird placement of braces. I doubt this does what you intended. |
Oh, the parentheses are indeed wrong. The idea is represented. |
I believe it varies a lot by implementation: Some CPU runtimes (perhaps only macOS) are even stupidly pegged to LWS=1 unless, only maybe unless, a kernel really requires higher. Hopefully they will cope then, or at least pretend to. But all Apple runtimes are lemon runtimes. I'm not sure how LWS would/could correlate to CPU threads or cores but they should in some way, right? Intuitively (and I could be completely wrong) I would guess something like LWS == number of cores/threads should be reasonable. I'm trying to visualise some relation to CPU formats' Edit: I just recalled (iirc) that the first Intel CPU runtime I used came with a recommendation to use LWS=8, regardless of job, hardware and so on. I have absolutely no idea why. Edit2: BTW, Cuda's notion of "blocks" (which is just GWS/LWS) sounds pretty much like our |
- relax ASN.1 checks; - simplify support for EC keys. See #5745. Signed-off-by: Claudio André <[email protected]>
It's WIP because I need #5745 merged and we need to test using a GPU. I haven't found any problems with my hardware.
[EDITED]
Notes:
#ifdef CPU_FORMAT
and self test still passes (5 new vectors have been added);The difference between the formats is 5 vectors (2 x type 2 and 2 x type 6 + 1 DES) (none implemented for OpenCL):
Only types 2 and 6 are excluded
----
!self_test_running
probably can't handle self-testing properly. So, I'm not sure if we should port them to OpenCL;