Description
Previous ID | SR-14516 |
Radar | rdar://problem/77008933 |
Original Reporter | @weissi |
Type | Bug |
Attachment: Download
Environment
Apple Swift version 5.4 (swiftlang-1205.0.22.2 clang-1205.0.19.29)
Additional Detail from JIRA
Votes | 0 |
Component/s | Compiler |
Labels | Bug, 5.4Regression, Optimizer |
Assignee | None |
Priority | Medium |
md5: 06a0e70a5391143833033eeeb67e35f7
Issue Description:
The following (sorry, this used to be 1700 lines, and creduce
reduced this) program allocates 10,000 times:
extension CircularBuffer {
@inline(never)
func F() {
_ = first! // << this allocates as it calls generic specialization <repro.NIOAny> of repro.CircularBuffer.subscript.read : (repro.CircularBuffer<A>.Index) -> A
}
}
protocol p {
init()
}
struct CircularBuffer<d: p>{
var e: ContiguousArray<d?> = [.init()]
struct Index: Comparable {
var f: Int = 0
var i: Int {
return f
}
static func == (a: Index, j: Index) -> Bool {
return a.i > j.i
}
static func < (aa: Index, j: Index) -> Bool {
return aa.i < j.i
}
}
func index(after: Index) -> Index {
return index(after, offsetBy: 1)
}
subscript(k: Index) -> d {
get {
return e[k.i]!
}
}
var startIndex = Index()
var endIndex = Index()
}
extension CircularBuffer: Collection {}
enum XY {
case X
case Y
}
struct NIOAny: p {
let av: aw = .ay(())
enum aw {
case ax(XY)
case ay(Any)
init<az>(a: az) {
self = .ay(a)
}
}
}
let b = CircularBuffer<NIOAny>()
for _ in 0..<10000 {
b.F()
}
here's how it goes:
-
b.F
callsb.first
(which is provided automatically from Collection) -
b.first
calls the subscript's read accessor which has an unconditionalmalloc
(for coroutines I think) in there 🙁
_$s5repro14CircularBufferVyxAC5IndexVyx_GcirAA6NIOAnyV_Tg5: // generic specialization <repro.NIOAny> of repro.CircularBuffer.subscript.read : (repro.CircularBuffer<A>.Index) -> A
0x0000000100002110 push rbp ; CODE XREF=_$s5repro14CircularBufferV1FyyFAA6NIOAnyV_Tg5+34
0x0000000100002111 mov rbp, rsp
0x0000000100002114 push r15
0x0000000100002116 push r14
0x0000000100002118 push r12
0x000000010000211a push rbx
0x000000010000211b sub rsp, 0x30
0x000000010000211f mov r14, rdx
0x0000000100002122 mov r15, rsi
0x0000000100002125 mov r12, rdi
0x0000000100002128 mov edi, 0x21 ; argument "size" for method imp___stubs__malloc
0x000000010000212d call imp___stubs__malloc ; malloc
the read accessors really shouldn't allocate as that defeats many other optimisations.
Full program attached.
Repro:
$ swiftc -O ~/tmp/repro.swift && sudo ~/devel/swift-nio/dev/malloc-aggregation.d -c ./repro
dtrace: system integrity protection is on, some features will not be available
=====
This will collect stack shots of allocations and print it when you exit dtrace.
So go ahead, run your tests and then press Ctrl+C in this window to see the aggregated result
=====
[...]
libsystem_malloc.dylib`malloc
repro`specialized CircularBuffer.subscript.read+0x22
repro`specialized CircularBuffer.F()+0x27
repro`main+0x7c
libdyld.dylib`start+0x1
0x1
10000
See how we get 10,000 allocations (malloc) through the subscript.read by calling F 10,000 times?
This seems to be a Swift 5.4 regression. We found this issue in the NIO CI which regresses allocations in a few allocation counter tests but only in 5.4.
Also affects
Swift version 5.4-dev (LLVM 7a20f40c45aca5d, Swift 031b848b7092c06)
In case you're into creduce
, this is the interestingness test I used (for Linux)
#!/bin/bash
set -eu
swiftc -O repro.swift
objdump -d repro | grep -A14 '14CircularBufferVyxAC5IndexVyx_GcirAA6NIOAnyV_Tg5>:' | grep -q malloc