Skip to content

Commit 539eaef

Browse files
matkladandrewrk
authored andcommitted
reduce AstGen.numberLiteral stack usage
At the moment, the LLVM IR we generate for this fn is define internal fastcc void @AstGen.numberLiteral ... { Entry: ... %16 = alloca %"fmt.parse_float.decimal.Decimal(f128)", align 8 ... That `Decimal` is huuuge! It stores pub const max_digits = 11564; digits: [max_digits]u8, on the stack. It comes from `convertSlow` function, which LLVM happily inlined, despite it being the cold path. Forbid inlining that to not penalize callers with excessive stack usage. Backstory: I was looking for needles memcpys in TigerBeetle, and came up with this copyhound.zig tool for doing just that: https://github.com/tigerbeetle/tigerbeetle/blob/ee67e2ab95ed7ccf909be377dc613869738d48b4/src/copyhound.zig Got curious, run it on the Zig's own code base, and looked at some of the worst offenders. List of worst offenders: warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy warning: AstGen.numberLiteral: 11584 bytes memcpy warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy warning: http.Client.connectUnproxied: 16896 bytes memcpy warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy Note from Andrew: I removed `noinline` from this commit since it should be enough to set it to be cold.
1 parent e313584 commit 539eaef

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

lib/std/fmt/parse_float/convert_slow.zig

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,12 @@ pub fn getShift(n: usize) usize {
3232
///
3333
/// The algorithms described here are based on "Processing Long Numbers Quickly",
3434
/// available here: <https://arxiv.org/pdf/2101.11408.pdf#section.11>.
35+
///
36+
/// Note that this function needs a lot of stack space and is marked
37+
/// cold to hint against inlining into the caller.
3538
pub fn convertSlow(comptime T: type, s: []const u8) BiasedFp(T) {
39+
@setCold(true);
40+
3641
const MantissaT = mantissaType(T);
3742
const min_exponent = -(1 << (math.floatExponentBits(T) - 1)) + 1;
3843
const infinite_power = (1 << math.floatExponentBits(T)) - 1;

0 commit comments

Comments
 (0)