Skip to content

Open coding builtin_memset should not be faster #143741

Open
@JonChesterfield

Description

@JonChesterfield

PR #143607 for libc replaces a builtin_memset with an open coded equivalent and reports substantial performance gains (and minor reg count increase). That shouldn't be possible.

memset (and memcpy, memmove etc) have completely specified semantics and intrinsics with alignment metadata and similar on them. The backend could be lowering those optimally. In particular, enabling inactive lanes in the exec mask and clearing them again afterwards is likely to outperform doing the operation on whatever subset of the warp is active at the call site, at least when pointers are uniform and so forth.

Leaving this issue as a reminder to look into this, fix it, then move libc back to using builtin_memset.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions