Skip to content

std.heap.PageAllocator: improve implementation on Windows using NtAllocateVirtualMemory #22846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 4 tasks
andrewrk opened this issue Feb 10, 2025 · 8 comments
Open
2 of 4 tasks
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-windows standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Feb 10, 2025

I hope I don't need to explain why this is not ideal:

// Fallback: reserve a range of memory large enough to find a
// sufficiently aligned address, then free the entire range and
// immediately allocate the desired subset. Another thread may have won
// the race to map the target range, in which case a retry is needed.
windows.VirtualFree(addr, 0, windows.MEM_RELEASE);
const overalloc_len = n + alignment_bytes - page_size;
const aligned_len = mem.alignForward(usize, n, page_size);
while (true) {
const reserved_addr = windows.VirtualAlloc(
null,
overalloc_len,
windows.MEM_RESERVE,
windows.PAGE_NOACCESS,
) catch return null;
const aligned_addr = mem.alignForward(usize, @intFromPtr(reserved_addr), alignment_bytes);
windows.VirtualFree(reserved_addr, 0, windows.MEM_RELEASE);
const ptr = windows.VirtualAlloc(
@ptrFromInt(aligned_addr),
aligned_len,
windows.MEM_COMMIT | windows.MEM_RESERVE,
windows.PAGE_READWRITE,
) catch continue;
return @ptrCast(ptr);
}

This issue is a subtask of #1840.

Issue close criteria:

  • delete VirtualAlloc from std lib
  • delete VirtualFree from std lib
  • use NtAllocateVirtualMemory instead
  • take advantage of MEM_RESERVE_PLACEHOLDER to guarantee no race

As a bonus, measure SmpAllocator performance on Windows before and after, as well as providing a comparison with kernel32 HeapAlloc and related functions.

Related:

@andrewrk andrewrk added contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-windows standard library This issue involves writing Zig code for the standard library. labels Feb 10, 2025
@andrewrk andrewrk added this to the 0.15.0 milestone Feb 10, 2025
@ziggoon
Copy link
Contributor

ziggoon commented Mar 2, 2025

Hi @andrewrk, I would be interested in exploring this. Cool if I take a stab?

@alexrp
Copy link
Member

alexrp commented Mar 2, 2025

You're welcome to take a stab at any issue tagged with contributor friendly This issue is limited in scope and/or knowledge of Zig internals. .

@ziggoon
Copy link
Contributor

ziggoon commented Mar 5, 2025

Apologies for closing the first PR, accidentally messed up with git. In regards to the initial tests I performed here were the results I was seeing:

| Size | Alignment | PageAlloc | GPA | ArenaAlloc | SmpAlloc | FixedBuf | Win32Heap |
|     64 |         8 |    2.51us | 3.74us |       0.00us |     0.03us |     0.00us |     0.03us |
|     64 |        16 |    2.27us | 3.42us |       0.00us |     0.02us |     0.00us |     0.03us |
|     64 |        64 |    2.25us | 3.38us |       0.00us |     0.02us |     0.00us |     0.03us |
|     64 |      4096 |    2.19us | 3.45us |       0.00us |     0.03us |     0.00us |     0.05us |
|     64 |     16384 |    2.25us | 3.43us |       0.00us |     0.03us |     0.00us |     0.13us |
|   1024 |         8 |    2.19us | 3.48us |       0.00us |     0.02us |     0.00us |     0.03us |
|   1024 |        16 |    2.21us | 3.38us |       0.00us |     0.02us |     0.00us |     0.03us |
|   1024 |        64 |    2.27us | 3.39us |       0.00us |     0.02us |     0.00us |     0.04us |
|   1024 |      4096 |    2.22us | 3.40us |       0.00us |     0.02us |     0.00us |     0.05us |
|   1024 |     16384 |    2.22us | 3.47us |       0.00us |     0.02us |     0.00us |     0.13us |
|   4096 |         8 |    2.21us | 3.57us |       0.00us |     0.03us |     0.00us |     0.05us |
|   4096 |        16 |    2.37us | 3.59us |       0.00us |     0.02us |     0.00us |     0.04us |
|   4096 |        64 |    2.27us | 3.48us |       0.00us |     0.02us |     0.00us |     0.05us |
|   4096 |      4096 |    2.30us | 3.53us |       0.00us |     0.02us |     0.00us |     0.06us |
|   4096 |     16384 |    2.25us | 3.39us |       0.00us |     0.02us |     0.00us |     0.14us |
|  16384 |         8 |    2.30us | 3.57us |       0.00us |     0.02us |     0.00us |     0.13us |
|  16384 |        16 |    2.37us | 3.53us |       0.00us |     0.02us |     0.00us |     0.13us |
|  16384 |        64 |    2.32us | 3.48us |       0.00us |     0.03us |     0.00us |     0.13us |
|  16384 |      4096 |    2.33us | 3.61us |       0.00us |     0.02us |     0.00us |     0.14us |
|  16384 |     16384 |    2.39us | 3.50us |       0.00us |     0.02us |     0.00us |     0.18us |
|  65536 |         8 |    2.47us | 2.59us |       0.00us |     2.53us |     0.00us |     2.85us |
|  65536 |        16 |    2.63us | 2.49us |       0.00us |     2.40us |     0.00us |     1.16us |
|  65536 |        64 |    2.61us | 2.56us |       0.00us |     2.32us |     0.00us |     1.14us |
|  65536 |      4096 |    2.44us | 2.45us |       0.00us |     2.34us |     0.00us |     1.21us |
|  65536 |     16384 |    2.45us | 2.44us |       0.00us |     2.42us |     0.00us |     1.43us |
| 1048576 |         8 |    4.16us | 4.07us |       0.00us |     4.43us |     0.00us |     6.02us |
| 1048576 |        16 |    4.02us | 4.11us |       0.00us |     4.49us |     0.00us |     5.99us |
| 1048576 |        64 |    4.09us | 4.19us |       0.00us |     4.50us |     0.00us |     6.24us |
| 1048576 |      4096 |    4.16us | 4.18us |       0.00us |     4.50us |     0.00us |     7.04us |
| 1048576 |     16384 |    4.05us | 4.16us |       0.00us |     4.57us |     0.00us |     7.35us |

@ziggoon
Copy link
Contributor

ziggoon commented Mar 9, 2025

Hi @alexrp and @andrewrk - I believe that most of the issue has been tackled now, but I would like to spark a conversation about the usage of next_mmap_addr_hint which would require the implementation & usage of NtAllocateVirtualMemoryEx. The Ex version is only compatible >= Windows 10 X.x (I still need to find a solid reference for the actual version, I just know its W10 for sure). I know how to detect OS version during runtime, but this will add additional overhead to the function.

Curious on how y'all would like to approach this issue - thanks!

@alexrp
Copy link
Member

alexrp commented Mar 9, 2025

The standard library technically requires Windows 10, so I think it's completely fine for that to be the default code path.

If someone cares enough to make the code run on older Windows versions, patches are welcome, but they should perform compile-time target OS version checks.

That said, it is indeed important which exact Windows 10 version the function was introduced in.

@ziggoon
Copy link
Contributor

ziggoon commented Mar 9, 2025

The standard library technically requires Windows 10, so I think it's completely fine for that to be the default code path.

If someone cares enough to make the code run on older Windows versions, patches are welcome, but they should perform compile-time target OS version checks.

That said, it is indeed important which exact Windows 10 version the function was introduced in.

Sounds good - I'll get NtAllocateVirtualMemoryEx implemented and update the PageAllocator implementation to use it instead. Regarding the windows version, it appears it was released during NTDDI_WIN10_RS4 or NTDDI_WIN10_RS5. There is no solid documentation, but the function started appearing in late 2018 (e.g Windows 10 Version 1809)

VirtualAlloc2 lists the minimum supported client as Windows 10, but no specific version.
https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc2

@alexrp
Copy link
Member

alexrp commented Mar 9, 2025

That might be a problem then. Our current minimum supported version is NTDDI_WIN10 as you can see here:

zig/lib/std/Target.zig

Lines 219 to 248 in 1eb729b

/// Based on NTDDI version constants from
/// https://docs.microsoft.com/en-us/cpp/porting/modifying-winver-and-win32-winnt
pub const WindowsVersion = enum(u32) {
nt4 = 0x04000000,
win2k = 0x05000000,
xp = 0x05010000,
ws2003 = 0x05020000,
vista = 0x06000000,
win7 = 0x06010000,
win8 = 0x06020000,
win8_1 = 0x06030000,
win10 = 0x0A000000, //aka win10_th1
win10_th2 = 0x0A000001,
win10_rs1 = 0x0A000002,
win10_rs2 = 0x0A000003,
win10_rs3 = 0x0A000004,
win10_rs4 = 0x0A000005,
win10_rs5 = 0x0A000006,
win10_19h1 = 0x0A000007,
win10_vb = 0x0A000008, //aka win10_19h2
win10_mn = 0x0A000009, //aka win10_20h1
win10_fe = 0x0A00000A, //aka win10_20h2
win10_co = 0x0A00000B, //aka win10_21h1
win10_ni = 0x0A00000C, //aka win10_21h2
win10_cu = 0x0A00000D, //aka win10_22h2
win11_zn = 0x0A00000E, //aka win11_21h2
win11_ga = 0x0A00000F, //aka win11_22h2
win11_ge = 0x0A000010, //aka win11_23h2
win11_dt = 0x0A000011, //aka win11_24h2
_,

zig/lib/std/Target.zig

Lines 570 to 575 in 1eb729b

.windows => .{
.windows = .{
.min = .win10,
.max = WindowsVersion.latest,
},
},

This doesn't necessarily mean we can't make use of newer functions, but it does mean we'd at least need compile-time OS version checks.

@alexrp
Copy link
Member

alexrp commented Mar 9, 2025

That said, we could perhaps consider bumping our minimum version: https://learn.microsoft.com/en-us/lifecycle/products/windows-10-home-and-pro

I'll defer to @andrewrk on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-windows standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

3 participants