[SR-12777] SIMD generates bad code for string processing use-cases


   |                  |                 |
   |------------------|-----------------|
   |Previous ID       | SR-12777      |
   |Radar             | rdar://problem/63076883         |
   |Original Reporter | @Lukasa      |
   |Type              | Bug    |


Attachment: [Download](https://user-images.githubusercontent.com/2727770/164963789-20a76f36-472d-408d-8fe0-8f942db46ee7.gz)

      
   <details>
  <summary>Additional Detail from JIRA</summary>

   |                  |                 |
   |------------------|-----------------|
   |Votes             | 1         |
   |Component/s       | Standard Library    |
   |Labels            | Bug, simd        |
   |Assignee          | None      |
   |Priority          | Medium      |

   

   md5: c21b41dce5f7de3c531660d78015bffb

  </details>





**Issue Description:**


I recently investigated using the SIMD types to implement a generic SIMD vectorised string search. To get a feel for how well or poorly the SIMD types were doing I compared them against two other implementations: a pure Swift byte wise string search, and a C implementation directly using the Intel intrinsics. The expectation is that the C implementation is a "best case" and the SIMD one would be judged by how close it could get to that C implementation from the baseline of the byte wise search. This did not attempt to solve the problem in full generality, dealing with alignment etc, as I just wanted to investigate what the order of magnitude of the wins might be.

My benchmark package is attached.

Here are my results, searching a 1kB buffer for a byte that is in the last byte:

``` java
Pure Swift, 1kB, last
Duration: nanoseconds(933)
Swift SIMD, 1kB, last
Duration: nanoseconds(15925)
Intel SIMD, 1kB, last
Duration: nanoseconds(155)
```

I was surprised by how poorly the SIMD implementation performed, so I extracted the common implementations into Godbolt. The idealised C implementation is [here](https://godbolt.org/z/4iZues). The byte wise Swift implementation is [here](https://swift.godbolt.org/z/XPkGBB). The Swift SIMD implementation is [here](https://swift.godbolt.org/z/_87kh2).

A part of the performance cost here is that there is no apparent way to transform a SIMDMask into a bitmask. This limits my ability to query it to checking every SIMD lane manually, which is quite expensive. I tried to work around this with a comparison against the zero mask, but that comparison *also* appears to be done lanewise, another strange performance issue.

Finally, I was a bit startled at exactly how much code was needed to initialise the SIMD32s from the UnsafeRawBufferPointer. While I didn't expect C's level of brevity, the amount of code involved here is so high that if I add even a few extra lines of code to the Swift function the SIMD32 initialiser gets outlined for code size reasons.

The result of all of this is that SIMD appears to be unusable for string processing tasks at this time, unless I've missed something substantial about how it's supposed to be used.


   

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SR-12777] SIMD generates bad code for string processing use-cases #55222

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Previous ID	SR-12777
Radar	rdar://problem/63076883
Original Reporter	@Lukasa
Type	Bug


Votes	1
Component/s	Standard Library
Labels	Bug, simd
Assignee	None
Priority	Medium

[SR-12777] SIMD generates bad code for string processing use-cases #55222

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions