Skip to content

[SR-12777] SIMD generates bad code for string processing use-cases #55222

Open
@Lukasa

Description

@Lukasa
Previous ID SR-12777
Radar rdar://problem/63076883
Original Reporter @Lukasa
Type Bug

Attachment: Download

Additional Detail from JIRA
Votes 1
Component/s Standard Library
Labels Bug, simd
Assignee None
Priority Medium

md5: c21b41dce5f7de3c531660d78015bffb

Issue Description:

I recently investigated using the SIMD types to implement a generic SIMD vectorised string search. To get a feel for how well or poorly the SIMD types were doing I compared them against two other implementations: a pure Swift byte wise string search, and a C implementation directly using the Intel intrinsics. The expectation is that the C implementation is a "best case" and the SIMD one would be judged by how close it could get to that C implementation from the baseline of the byte wise search. This did not attempt to solve the problem in full generality, dealing with alignment etc, as I just wanted to investigate what the order of magnitude of the wins might be.

My benchmark package is attached.

Here are my results, searching a 1kB buffer for a byte that is in the last byte:

Pure Swift, 1kB, last
Duration: nanoseconds(933)
Swift SIMD, 1kB, last
Duration: nanoseconds(15925)
Intel SIMD, 1kB, last
Duration: nanoseconds(155)

I was surprised by how poorly the SIMD implementation performed, so I extracted the common implementations into Godbolt. The idealised C implementation is here. The byte wise Swift implementation is here. The Swift SIMD implementation is here.

A part of the performance cost here is that there is no apparent way to transform a SIMDMask into a bitmask. This limits my ability to query it to checking every SIMD lane manually, which is quite expensive. I tried to work around this with a comparison against the zero mask, but that comparison also appears to be done lanewise, another strange performance issue.

Finally, I was a bit startled at exactly how much code was needed to initialise the SIMD32s from the UnsafeRawBufferPointer. While I didn't expect C's level of brevity, the amount of code involved here is so high that if I add even a few extra lines of code to the Swift function the SIMD32 initialiser gets outlined for code size reasons.

The result of all of this is that SIMD appears to be unusable for string processing tasks at this time, unless I've missed something substantial about how it's supposed to be used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA deviation from expected or documented behavior. Also: expected but undesirable behavior.simdstandard libraryArea: Standard library umbrella

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions