-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
std.posix.getenv: early-return comparison #23265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of parsing the full key and value for each environment variable before checking the key for (case-insensitive) equality, we skip to the next environment variable once it's no longer possible for the key to match. This makes getting environment variables about 2x faster across the board on Windows. Note: We still have to scan to find the end of each environment variable, even the ones that are skipped (we only know where it ends by a NUL terminator), so this strategy doesn't provide the same speedup on Windows as it does on POSIX (ziglang#23265)
Instead of parsing the full key and value for each environment variable before checking the key for (case-insensitive) equality, we skip to the next environment variable once it's no longer possible for the key to match. This makes getting environment variables about 2x faster across the board on Windows. Note: We still have to scan to find the end of each environment variable, even the ones that are skipped (we only know where it ends by a NUL terminator), so this strategy doesn't provide the same speedup on Windows as it does on POSIX (ziglang#23265)
This will cause a regression with regards to looking up environment variable names with An easy fix would be adding something like: if (std.mem.indexOfScalar(u8, key, '=') != null) return null; (see also the standalone test in #23272 for a relevant test case) |
Apologies, I believe that POSIX does not allow for environment variables with embedded https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html
|
Correct, but users can still request the lookup of invalid environment variable names. With the changes in this PR, calling Instead, looking up any name with zig/lib/libc/musl/src/env/getenv.c Lines 7 to 8 in 9c9d393
|
I see what you mean. The reference implementation would return |
const std = @import("std");
pub fn main() void {
std.log.info("FOO: {s}", .{ std.c.getenv("FOO") orelse "" });
std.log.info("FOO=ABC: {s}", .{ std.c.getenv("FOO=ABC") orelse "" });
} > FOO="ABC=123" zig run getenv-demo
info: FOO: ABC=123
info: FOO=ABC: ABC=123 |
If by reference implementation you mean the implementation on (not efficient, but correct behavior) EDIT: Sorry, I think I misunderstood what you were saying. If so, ignore this |
Terribly sorry if I'm misunderstanding the code, but by "reference" I mean Please see my understanding of what happens // name == "FOO=ABC"
size_t l = __strchrnul(name, '=') /* == (name + 3) */ - name; // == 3
if (l && !name[l] && __environ) // 3 && !'=' && __environ
for (char **e = __environ; *e; e++)
// **e == &("FOO=ABC=123")
if (!strncmp(name, *e, l) /* strncmp("FOO=ABC", "FOO=ABC=123", 3) == 0 */ && l[*e] /* "FOO=ABC=123"[3] */ == '=')
return *e + l+1; // "FOO=ABC=123"[4:] == "ABC=123"
return 0; |
No worries, the code is hard to understand. You're misinterpreting the Here's my rewrite of your comment for that line: if (l && !name[l] && __environ) // 3 && name[3] == 0 && __environ so in the case of |
Thank you for clarifying, pushed the suggested change to match This still, I think, puts a spotlight on the fact that |
I don't think they will behave differently, but it is a good idea to test for this. I'll make the standalone test added in #23272 also compile version(s) with libc linked and make sure it tests |
At least on macOS I'm observing behaviour reported in #23265 (comment). Including with #include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
const char *v = getenv("FOO");
printf("FOO: %s\n", v ? v : "");
v = getenv("FOO=ABC");
printf("FOO=ABC: %s\n", v ? v : "");
return 0;
} |
Hm, you seem to be right, and musl might be the odd one out here. MinGW and MSVC libc on Windows returns non-null for the The musl behavior seems obviously more correct to me, though, since as you quoted from the POSIX spec before:
FWIW, EDIT: Was hoping to maybe find a musl commit where this behavior was introduced but it's been there since the earliest commit: https://git.musl-libc.org/cgit/musl/commit/src/env/getenv.c?id=0b44a0315b47dd8eced9f3b7f31580cf14bbfc01 |
Made a follow-up issue for |
Nice work, thanks for tackling this! |
Yeah, thanks! |
Addresses the issue described in #22917.
Possibly interferes with @andrewrk work on https://github.com/ziglang/zig/tree/main branch.
Original implementation https://github.com/ziglang/zig/blob/aa3db7cc15/lib/std/posix.zig#L2004 for each environment variable iterates until the end of its name (until
=
), and only then compares entire name tokey
.Since some of the environment variables could be quite long (i.e.
GHOSTTY_SHELL_INTEGRATION_NO_SUDO=1
), these sizes add up.Simply - in order to find a
key
inenviron
, it has to iterate over cumulative sizes of each env variable name before it.Proposed implementation functionally does what
strncmp
would do: stops iterating and moves to the next variable on first character mismatch withkey
.See the benchmarks below.
Disclaimer: this was tested on macOS, with a variation of #23264 fix applied.
To address the elephant in the room: this loop is duplicated, but I'm hesitant to refactor it in order to deduplicate because of
// TODO see https://github.com/ziglang/zig/issues/4524
preamble and ongoing work to fix it.