Skip to content

SSH OpenCL format: synchronize with CPU format #5747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 30, 2025

It's WIP because I need #5745 merged and we need to test using a GPU. I haven't found any problems with my hardware.


[EDITED]

Notes:

  • I removed two #ifdef CPU_FORMAT and self test still passes (5 new vectors have been added);

The difference between the formats is 5 vectors (2 x type 2 and 2 x type 6 + 1 DES) (none implemented for OpenCL):

$ run/john --format=ssh-opencl --list=format-tests | wc -l
15
$ run/john --format=ssh --list=format-tests | wc -l
19 # (20 after #5745)

On 2025-04-09
Only types 2 and 6 are excluded
----
  • The most recent changes that depend on !self_test_running probably can't handle self-testing properly. So, I'm not sure if we should port them to OpenCL;
  • In fact, I tried to migrate the changes made to the CPU format, but the OpenCL format failed the self-test procedures. So I reverted;
  • Current status:
Device 1: cpu-haswell-AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
Using default input encoding: UTF-8
Loaded 15 password hashes with 15 different salts (ssh-opencl, SSH private key [RSA/DSA/EC 3DES/AES OpenCL])
Loaded hashes with cost 1 (KDF/cipher [0=MD5/AES 1=MD5/3DES 2=Bcrypt/AES]) varying from 0 to 1
Loaded hashes with cost 2 (iteration count) varying from 1 to 2
Note: Passwords longer than 10 [worst case UTF-8] to 32 [ASCII] rejected
LWS=8 GWS=32768 (4096 blocks) 
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
Warning: Only 15 candidates buffered, minimum 32768 needed for performance.
password123      (?)     
hashcat          (?)     
password         (?)     
hashcat          (?)     
hashcat          (?)     
strongpassword   (?)     
hashcat          (?)     
television       (?)     
password         (?)     
johnjohn         (?)     
C0Ld.FUS10N      (?)     
Olympics         (?)     
extuitive        (?)     
television       (?)     
albert           (?)     
15g 0:00:00:00 DONE (2025-04-09 13:24) 375.0g/s 375.0p/s 5625c/s 5625C/s hashcat..johnjohn
Use the "--show" option to display all of the cracked passwords reliably
Session completed.

@solardiz
Copy link
Member

SSH OpenCL format: limit LWS up to 512 on CPU

I've seen many segmentation faults when it reaches 1024.

Is this issue specific to this format at all? Maybe a change is needed in the shared OpenCL host code?

@ghost
Copy link
Author

ghost commented Mar 31, 2025

SSH OpenCL format: limit LWS up to 512 on CPU
I've seen many segmentation faults when it reaches 1024.

Is this issue specific to this format at all? Maybe a change is needed in the shared OpenCL host code?

I prefer to avoid invasive changes. On the other hand, why would anyone on Earth need to set LWS=1024 for a CPU?

@ghost
Copy link
Author

ghost commented Mar 31, 2025

The other way to go is:

From e46341d54a42e325f6a16b810403a8c23826c7b0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Claudio=20Andr=C3=A9?= <[email protected]>
Date: Mon, 31 Mar 2025 09:01:47 -0300
Subject: [PATCH] OpenCL autotune: limit LWS up to 256 on CPU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I've seen many segmentation faults in the SSH OpenCL format when it
reaches 1024.

Signed-off-by: Claudio André <[email protected]>
---
 src/opencl_autotune.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/opencl_autotune.c b/src/opencl_autotune.c
index 47f1b421e..5553e36ba 100644
--- a/src/opencl_autotune.c
+++ b/src/opencl_autotune.c
@@ -59,6 +59,9 @@ size_t autotune_get_task_max_work_group_size(int use_local_memory,
 	else
 		max_available = get_device_max_lws(gpu_id);
 
+	if (cpu(device_info[gpu_id]) && (max_available > 256))
+		max_available = 256;
+
 	if (max_available > get_kernel_max_lws(gpu_id, crypt_kernel))
 		return get_kernel_max_lws(gpu_id, crypt_kernel);
 
-- 
2.43.0

I don't see any reason why, for example, one should use LWS > 128 on a CPU. But let's listen to magnum's wise words.

@solardiz
Copy link
Member

+	if (cpu(device_info[gpu_id] && (max_available > 256)))

This is really weird placement of braces. I doubt this does what you intended.

@ghost
Copy link
Author

ghost commented Mar 31, 2025

+	if (cpu(device_info[gpu_id] && (max_available > 256)))

This is really weird placement of braces. I doubt this does what you intended.

Oh, the parentheses are indeed wrong. The idea is represented.

@magnumripper
Copy link
Member

magnumripper commented Apr 4, 2025

I don't see any reason why, for example, one should use LWS > 128 on a CPU. But let's listen to magnum's wise words.

I believe it varies a lot by implementation: Some CPU runtimes (perhaps only macOS) are even stupidly pegged to LWS=1 unless, only maybe unless, a kernel really requires higher. Hopefully they will cope then, or at least pretend to. But all Apple runtimes are lemon runtimes.

I'm not sure how LWS would/could correlate to CPU threads or cores but they should in some way, right? Intuitively (and I could be completely wrong) I would guess something like LWS == number of cores/threads should be reasonable. I'm trying to visualise some relation to CPU formats' count vs. OMP_NUM_THREADS and OMP_SCALE, but I have yet to experience an Aha! moment.

Edit: I just recalled (iirc) that the first Intel CPU runtime I used came with a recommendation to use LWS=8, regardless of job, hardware and so on. I have absolutely no idea why.

Edit2: BTW, Cuda's notion of "blocks" (which is just GWS/LWS) sounds pretty much like our OMP_SCALE thing, doesn't it? For whatever that's worth.

- relax ASN.1 checks;
- simplify support for EC keys.

See #5745.

Signed-off-by: Claudio André <[email protected]>
@ghost ghost changed the title (WIP) SSH OpenCL format: synchronize with CPU format SSH OpenCL format: synchronize with CPU format Apr 9, 2025
@ghost ghost force-pushed the fix/opencl branch from 6b98a30 to bbf7ff1 Compare April 9, 2025 13:42
@ghost ghost closed this by deleting the head repository Apr 11, 2025
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants