SSH OpenCL format: synchronize with CPU format #5747

ghost · 2025-03-30T23:17:54Z

It's WIP because I need #5745 merged and we need to test using a GPU. I haven't found any problems with my hardware.

[EDITED]

Notes:

I removed two #ifdef CPU_FORMAT and self test still passes (5 new vectors have been added);

The difference between the formats is 5 vectors (2 x type 2 and 2 x type 6 + 1 DES) (none implemented for OpenCL):

$ run/john --format=ssh-opencl --list=format-tests | wc -l
15
$ run/john --format=ssh --list=format-tests | wc -l
19 # (20 after #5745)

On 2025-04-09
Only types 2 and 6 are excluded
----

The most recent changes that depend on !self_test_running probably can't handle self-testing properly. So, I'm not sure if we should port them to OpenCL;
In fact, I tried to migrate the changes made to the CPU format, but the OpenCL format failed the self-test procedures. So I reverted;
Current status:

Device 1: cpu-haswell-AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
Using default input encoding: UTF-8
Loaded 15 password hashes with 15 different salts (ssh-opencl, SSH private key [RSA/DSA/EC 3DES/AES OpenCL])
Loaded hashes with cost 1 (KDF/cipher [0=MD5/AES 1=MD5/3DES 2=Bcrypt/AES]) varying from 0 to 1
Loaded hashes with cost 2 (iteration count) varying from 1 to 2
Note: Passwords longer than 10 [worst case UTF-8] to 32 [ASCII] rejected
LWS=8 GWS=32768 (4096 blocks) 
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
Warning: Only 15 candidates buffered, minimum 32768 needed for performance.
password123      (?)     
hashcat          (?)     
password         (?)     
hashcat          (?)     
hashcat          (?)     
strongpassword   (?)     
hashcat          (?)     
television       (?)     
password         (?)     
johnjohn         (?)     
C0Ld.FUS10N      (?)     
Olympics         (?)     
extuitive        (?)     
television       (?)     
albert           (?)     
15g 0:00:00:00 DONE (2025-04-09 13:24) 375.0g/s 375.0p/s 5625c/s 5625C/s hashcat..johnjohn
Use the "--show" option to display all of the cracked passwords reliably
Session completed.

solardiz · 2025-03-31T00:23:23Z

SSH OpenCL format: limit LWS up to 512 on CPU

I've seen many segmentation faults when it reaches 1024.

Is this issue specific to this format at all? Maybe a change is needed in the shared OpenCL host code?

ghost · 2025-03-31T00:59:40Z

SSH OpenCL format: limit LWS up to 512 on CPU
I've seen many segmentation faults when it reaches 1024.

Is this issue specific to this format at all? Maybe a change is needed in the shared OpenCL host code?

I prefer to avoid invasive changes. On the other hand, why would anyone on Earth need to set LWS=1024 for a CPU?

ghost · 2025-03-31T12:18:50Z

The other way to go is:

From e46341d54a42e325f6a16b810403a8c23826c7b0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Claudio=20Andr=C3=A9?= <[email protected]>
Date: Mon, 31 Mar 2025 09:01:47 -0300
Subject: [PATCH] OpenCL autotune: limit LWS up to 256 on CPU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I've seen many segmentation faults in the SSH OpenCL format when it
reaches 1024.

Signed-off-by: Claudio André <[email protected]>
---
 src/opencl_autotune.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/opencl_autotune.c b/src/opencl_autotune.c
index 47f1b421e..5553e36ba 100644
--- a/src/opencl_autotune.c
+++ b/src/opencl_autotune.c
@@ -59,6 +59,9 @@ size_t autotune_get_task_max_work_group_size(int use_local_memory,
 	else
 		max_available = get_device_max_lws(gpu_id);
 
+	if (cpu(device_info[gpu_id]) && (max_available > 256))
+		max_available = 256;
+
 	if (max_available > get_kernel_max_lws(gpu_id, crypt_kernel))
 		return get_kernel_max_lws(gpu_id, crypt_kernel);
 
-- 
2.43.0

I don't see any reason why, for example, one should use LWS > 128 on a CPU. But let's listen to magnum's wise words.

solardiz · 2025-03-31T18:39:44Z

+	if (cpu(device_info[gpu_id] && (max_available > 256)))

This is really weird placement of braces. I doubt this does what you intended.

ghost · 2025-03-31T20:17:20Z

+	if (cpu(device_info[gpu_id] && (max_available > 256)))
This is really weird placement of braces. I doubt this does what you intended.

Oh, the parentheses are indeed wrong. The idea is represented.

magnumripper · 2025-04-04T20:03:11Z

I don't see any reason why, for example, one should use LWS > 128 on a CPU. But let's listen to magnum's wise words.

I believe it varies a lot by implementation: Some CPU runtimes (perhaps only macOS) are even stupidly pegged to LWS=1 unless, only maybe unless, a kernel really requires higher. Hopefully they will cope then, or at least pretend to. But all Apple runtimes are lemon runtimes.

I'm not sure how LWS would/could correlate to CPU threads or cores but they should in some way, right? Intuitively (and I could be completely wrong) I would guess something like LWS == number of cores/threads should be reasonable. I'm trying to visualise some relation to CPU formats' count vs. OMP_NUM_THREADS and OMP_SCALE, but I have yet to experience an Aha! moment.

Edit: I just recalled (iirc) that the first Intel CPU runtime I used came with a recommendation to use LWS=8, regardless of job, hardware and so on. I have absolutely no idea why.

Edit2: BTW, Cuda's notion of "blocks" (which is just GWS/LWS) sounds pretty much like our OMP_SCALE thing, doesn't it? For whatever that's worth.

- relax ASN.1 checks; - simplify support for EC keys. See #5745. Signed-off-by: Claudio André <[email protected]>

SSH OpenCL format: synchronize with CPU format

bbf7ff1

- relax ASN.1 checks; - simplify support for EC keys. See #5745. Signed-off-by: Claudio André <[email protected]>

ghost changed the title ~~(WIP) SSH OpenCL format: synchronize with CPU format~~ SSH OpenCL format: synchronize with CPU format Apr 9, 2025

ghost force-pushed the fix/opencl branch from 6b98a30 to bbf7ff1 Compare April 9, 2025 13:42

ghost closed this by deleting the head repository Apr 11, 2025

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SSH OpenCL format: synchronize with CPU format #5747

SSH OpenCL format: synchronize with CPU format #5747

Uh oh!

ghost commented Mar 30, 2025 •

edited by ghost

Loading

Uh oh!

solardiz commented Mar 31, 2025

Uh oh!

ghost commented Mar 31, 2025

Uh oh!

ghost commented Mar 31, 2025 •

edited by ghost

Loading

Uh oh!

solardiz commented Mar 31, 2025

Uh oh!

ghost commented Mar 31, 2025

Uh oh!

magnumripper commented Apr 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

SSH OpenCL format: synchronize with CPU format #5747

SSH OpenCL format: synchronize with CPU format #5747

Uh oh!

Conversation

ghost commented Mar 30, 2025 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

solardiz commented Mar 31, 2025

Uh oh!

ghost commented Mar 31, 2025

Uh oh!

ghost commented Mar 31, 2025 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

solardiz commented Mar 31, 2025

Uh oh!

ghost commented Mar 31, 2025

Uh oh!

magnumripper commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ghost commented Mar 30, 2025 •

edited by ghost

Loading

ghost commented Mar 31, 2025 •

edited by ghost

Loading

magnumripper commented Apr 4, 2025 •

edited

Loading