Skip to content
This repository was archived by the owner on Mar 28, 2020. It is now read-only.

Commit 414670b

Browse files
committed
[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour
Summary: Quote from http://eel.is/c++draft/expr.add#4: ``` 4 When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P. (4.1) If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value. (4.2) Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n. (4.3) Otherwise, the behavior is undefined. ``` Therefore, as per the standard, applying non-zero offset to `nullptr` (or making non-`nullptr` a `nullptr`, by subtracting pointer's integral value from the pointer itself) is undefined behavior. (*if* `nullptr` is not defined, i.e. e.g. `-fno-delete-null-pointer-checks` was *not* specified.) To make things more fun, in C (6.5.6p8), applying *any* offset to null pointer is undefined, although Clang front-end pessimizes the code by not lowering that info, so this UB is "harmless". Since rL369789 (D66608 `[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null`) LLVM middle-end uses those guarantees for transformations. If the source contains such UB's, said code may now be miscompiled. Such miscompilations were already observed: * https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190826/687838.html * google/filament#1566 Surprisingly, UBSan does not catch those issues ... until now. This diff teaches UBSan about these UB's. `getelementpointer inbounds` is a pretty frequent instruction, so this does have a measurable impact on performance; I've addressed most of the obvious missing folds (and thus decreased the performance impact by ~5%), and then re-performed some performance measurements using my [[ https://github.com/darktable-org/rawspeed | RawSpeed ]] benchmark: (all measurements done with LLVM ToT, the sanitizer never fired.) * no sanitization vs. existing check: average `+21.62%` slowdown * existing check vs. check after this patch: average `22.04%` slowdown * no sanitization vs. this patch: average `48.42%` slowdown Reviewers: vsk, filcab, rsmith, aaron.ballman, vitalybuka, rjmccall, #sanitizers Reviewed By: rsmith Subscribers: kristof.beyls, nickdesaulniers, nikic, ychen, dtzWill, xbolva00, dberris, arphaman, rupprecht, reames, regehr, llvm-commits, cfe-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D67122 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@374293 91177308-0d34-0410-b5e6-96231b3b80d8
1 parent 1b1167f commit 414670b

12 files changed

+1196
-245
lines changed

docs/ReleaseNotes.rst

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,16 @@ Improvements to Clang's diagnostics
6060
Non-comprehensive list of changes in this release
6161
-------------------------------------------------
6262

63+
* In both C and C++ (C17 ``6.5.6p8``, C++ ``[expr.add]``), pointer arithmetic is
64+
only permitted within arrays. In particular, the behavior of a program is not
65+
defined if it adds a non-zero offset (or in C, any offset) to a null pointer,
66+
or if it forms a null pointer by subtracting an integer from a non-null
67+
pointer, and the LLVM optimizer now uses those guarantees for transformations.
68+
This may lead to unintended behavior in code that performs these operations.
69+
The Undefined Behavior Sanitizer ``-fsanitize=pointer-overflow`` check has
70+
been extended to detect these cases, so that code relying on them can be
71+
detected and fixed.
72+
6373
- For X86 target, -march=skylake-avx512, -march=icelake-client,
6474
-march=icelake-server, -march=cascadelake, -march=cooperlake will default to
6575
not using 512-bit zmm registers in vectorized code unless 512-bit intrinsics
@@ -238,7 +248,40 @@ Static Analyzer
238248
Undefined Behavior Sanitizer (UBSan)
239249
------------------------------------
240250

241-
- ...
251+
- * The ``pointer-overflow`` check was extended added to catch the cases where
252+
a non-zero offset is applied to a null pointer, or the result of
253+
applying the offset is a null pointer.
254+
255+
.. code-block:: c++
256+
257+
#include <cstdint> // for intptr_t
258+
259+
static char *getelementpointer_inbounds(char *base, unsigned long offset) {
260+
// Potentially UB.
261+
return base + offset;
262+
}
263+
264+
char *getelementpointer_unsafe(char *base, unsigned long offset) {
265+
// Always apply offset. UB if base is ``nullptr`` and ``offset`` is not
266+
// zero, or if ``base`` is non-``nullptr`` and ``offset`` is
267+
// ``-reinterpret_cast<intptr_t>(base)``.
268+
return getelementpointer_inbounds(base, offset);
269+
}
270+
271+
char *getelementpointer_safe(char *base, unsigned long offset) {
272+
// Cast pointer to integer, perform usual arithmetic addition,
273+
// and cast to pointer. This is legal.
274+
char *computed =
275+
reinterpret_cast<char *>(reinterpret_cast<intptr_t>(base) + offset);
276+
// If either the pointer becomes non-``nullptr``, or becomes
277+
// ``nullptr``, we must use ``computed`` result.
278+
if (((base == nullptr) && (computed != nullptr)) ||
279+
((base != nullptr) && (computed == nullptr)))
280+
return computed;
281+
// Else we can use ``getelementpointer_inbounds()``.
282+
return getelementpointer_inbounds(base, offset);
283+
}
284+
242285

243286
Core Analysis Improvements
244287
==========================

docs/UndefinedBehaviorSanitizer.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,8 @@ Available checks are:
130130
``__builtin_object_size``, and consequently may be able to detect more
131131
problems at higher optimization levels.
132132
- ``-fsanitize=pointer-overflow``: Performing pointer arithmetic which
133-
overflows.
133+
overflows, or where either the old or new pointer value is a null pointer
134+
(or in C, when they both are).
134135
- ``-fsanitize=return``: In C++, reaching the end of a
135136
value-returning function without returning a value.
136137
- ``-fsanitize=returns-nonnull-attribute``: Returning null pointer

lib/CodeGen/CGExprScalar.cpp

Lines changed: 99 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -4542,18 +4542,36 @@ struct GEPOffsetAndOverflow {
45424542
llvm::Value *OffsetOverflows;
45434543
};
45444544

4545-
/// Evaluate given GEPVal, which must be an inbounds GEP,
4545+
/// Evaluate given GEPVal, which is either an inbounds GEP, or a constant,
45464546
/// and compute the total offset it applies from it's base pointer BasePtr.
45474547
/// Returns offset in bytes and a boolean flag whether an overflow happened
45484548
/// during evaluation.
45494549
static GEPOffsetAndOverflow EmitGEPOffsetInBytes(Value *BasePtr, Value *GEPVal,
45504550
llvm::LLVMContext &VMContext,
45514551
CodeGenModule &CGM,
45524552
CGBuilderTy Builder) {
4553+
const auto &DL = CGM.getDataLayout();
4554+
4555+
// The total (signed) byte offset for the GEP.
4556+
llvm::Value *TotalOffset = nullptr;
4557+
4558+
// Was the GEP already reduced to a constant?
4559+
if (isa<llvm::Constant>(GEPVal)) {
4560+
// Compute the offset by casting both pointers to integers and subtracting:
4561+
// GEPVal = BasePtr + ptr(Offset) <--> Offset = int(GEPVal) - int(BasePtr)
4562+
Value *BasePtr_int =
4563+
Builder.CreatePtrToInt(BasePtr, DL.getIntPtrType(BasePtr->getType()));
4564+
Value *GEPVal_int =
4565+
Builder.CreatePtrToInt(GEPVal, DL.getIntPtrType(GEPVal->getType()));
4566+
TotalOffset = Builder.CreateSub(GEPVal_int, BasePtr_int);
4567+
return {TotalOffset, /*OffsetOverflows=*/Builder.getFalse()};
4568+
}
4569+
45534570
auto *GEP = cast<llvm::GEPOperator>(GEPVal);
4571+
assert(GEP->getPointerOperand() == BasePtr &&
4572+
"BasePtr must be the the base of the GEP.");
45544573
assert(GEP->isInBounds() && "Expected inbounds GEP");
45554574

4556-
const auto &DL = CGM.getDataLayout();
45574575
auto *IntPtrTy = DL.getIntPtrType(GEP->getPointerOperandType());
45584576

45594577
// Grab references to the signed add/mul overflow intrinsics for intptr_t.
@@ -4563,8 +4581,6 @@ static GEPOffsetAndOverflow EmitGEPOffsetInBytes(Value *BasePtr, Value *GEPVal,
45634581
auto *SMulIntrinsic =
45644582
CGM.getIntrinsic(llvm::Intrinsic::smul_with_overflow, IntPtrTy);
45654583

4566-
// The total (signed) byte offset for the GEP.
4567-
llvm::Value *TotalOffset = nullptr;
45684584
// The offset overflow flag - true if the total offset overflows.
45694585
llvm::Value *OffsetOverflows = Builder.getFalse();
45704586

@@ -4635,69 +4651,102 @@ CodeGenFunction::EmitCheckedInBoundsGEP(Value *Ptr, ArrayRef<Value *> IdxList,
46354651
if (!SanOpts.has(SanitizerKind::PointerOverflow))
46364652
return GEPVal;
46374653

4638-
// If the GEP has already been reduced to a constant, leave it be.
4639-
if (isa<llvm::Constant>(GEPVal))
4640-
return GEPVal;
4654+
llvm::Type *PtrTy = Ptr->getType();
4655+
4656+
// Perform nullptr-and-offset check unless the nullptr is defined.
4657+
bool PerformNullCheck = !NullPointerIsDefined(
4658+
Builder.GetInsertBlock()->getParent(), PtrTy->getPointerAddressSpace());
4659+
// Check for overflows unless the GEP got constant-folded,
4660+
// and only in the default address space
4661+
bool PerformOverflowCheck =
4662+
!isa<llvm::Constant>(GEPVal) && PtrTy->getPointerAddressSpace() == 0;
46414663

4642-
// Only check for overflows in the default address space.
4643-
if (GEPVal->getType()->getPointerAddressSpace())
4664+
if (!(PerformNullCheck || PerformOverflowCheck))
46444665
return GEPVal;
46454666

4667+
const auto &DL = CGM.getDataLayout();
4668+
46464669
SanitizerScope SanScope(this);
4670+
llvm::Type *IntPtrTy = DL.getIntPtrType(PtrTy);
46474671

46484672
GEPOffsetAndOverflow EvaluatedGEP =
46494673
EmitGEPOffsetInBytes(Ptr, GEPVal, getLLVMContext(), CGM, Builder);
46504674

4651-
auto *GEP = cast<llvm::GEPOperator>(GEPVal);
4652-
4653-
const auto &DL = CGM.getDataLayout();
4654-
auto *IntPtrTy = DL.getIntPtrType(GEP->getPointerOperandType());
4675+
assert((!isa<llvm::Constant>(EvaluatedGEP.TotalOffset) ||
4676+
EvaluatedGEP.OffsetOverflows == Builder.getFalse()) &&
4677+
"If the offset got constant-folded, we don't expect that there was an "
4678+
"overflow.");
46554679

46564680
auto *Zero = llvm::ConstantInt::getNullValue(IntPtrTy);
46574681

4658-
// Common case: if the total offset is zero, don't emit a check.
4659-
if (EvaluatedGEP.TotalOffset == Zero)
4682+
// Common case: if the total offset is zero, and we are using C++ semantics,
4683+
// where nullptr+0 is defined, don't emit a check.
4684+
if (EvaluatedGEP.TotalOffset == Zero && CGM.getLangOpts().CPlusPlus)
46604685
return GEPVal;
46614686

46624687
// Now that we've computed the total offset, add it to the base pointer (with
46634688
// wrapping semantics).
4664-
auto *IntPtr = Builder.CreatePtrToInt(GEP->getPointerOperand(), IntPtrTy);
4689+
auto *IntPtr = Builder.CreatePtrToInt(Ptr, IntPtrTy);
46654690
auto *ComputedGEP = Builder.CreateAdd(IntPtr, EvaluatedGEP.TotalOffset);
46664691

4667-
llvm::SmallVector<std::pair<llvm::Value *, SanitizerMask>, 1> Checks;
4668-
4669-
// The GEP is valid if:
4670-
// 1) The total offset doesn't overflow, and
4671-
// 2) The sign of the difference between the computed address and the base
4672-
// pointer matches the sign of the total offset.
4673-
llvm::Value *ValidGEP;
4674-
auto *NoOffsetOverflow = Builder.CreateNot(EvaluatedGEP.OffsetOverflows);
4675-
if (SignedIndices) {
4676-
// GEP is computed as `unsigned base + signed offset`, therefore:
4677-
// * If offset was positive, then the computed pointer can not be
4678-
// [unsigned] less than the base pointer, unless it overflowed.
4679-
// * If offset was negative, then the computed pointer can not be
4680-
// [unsigned] greater than the bas pointere, unless it overflowed.
4681-
auto *PosOrZeroValid = Builder.CreateICmpUGE(ComputedGEP, IntPtr);
4682-
auto *PosOrZeroOffset =
4683-
Builder.CreateICmpSGE(EvaluatedGEP.TotalOffset, Zero);
4684-
llvm::Value *NegValid = Builder.CreateICmpULT(ComputedGEP, IntPtr);
4685-
ValidGEP = Builder.CreateSelect(PosOrZeroOffset, PosOrZeroValid, NegValid);
4686-
} else if (!IsSubtraction) {
4687-
// GEP is computed as `unsigned base + unsigned offset`, therefore the
4688-
// computed pointer can not be [unsigned] less than base pointer,
4689-
// unless there was an overflow.
4690-
// Equivalent to `@llvm.uadd.with.overflow(%base, %offset)`.
4691-
ValidGEP = Builder.CreateICmpUGE(ComputedGEP, IntPtr);
4692-
} else {
4693-
// GEP is computed as `unsigned base - unsigned offset`, therefore the
4694-
// computed pointer can not be [unsigned] greater than base pointer,
4695-
// unless there was an overflow.
4696-
// Equivalent to `@llvm.usub.with.overflow(%base, sub(0, %offset))`.
4697-
ValidGEP = Builder.CreateICmpULE(ComputedGEP, IntPtr);
4698-
}
4699-
ValidGEP = Builder.CreateAnd(ValidGEP, NoOffsetOverflow);
4700-
Checks.emplace_back(ValidGEP, SanitizerKind::PointerOverflow);
4692+
llvm::SmallVector<std::pair<llvm::Value *, SanitizerMask>, 2> Checks;
4693+
4694+
if (PerformNullCheck) {
4695+
// In C++, if the base pointer evaluates to a null pointer value,
4696+
// the only valid pointer this inbounds GEP can produce is also
4697+
// a null pointer, so the offset must also evaluate to zero.
4698+
// Likewise, if we have non-zero base pointer, we can not get null pointer
4699+
// as a result, so the offset can not be -intptr_t(BasePtr).
4700+
// In other words, both pointers are either null, or both are non-null,
4701+
// or the behaviour is undefined.
4702+
//
4703+
// C, however, is more strict in this regard, and gives more
4704+
// optimization opportunities: in C, additionally, nullptr+0 is undefined.
4705+
// So both the input to the 'gep inbounds' AND the output must not be null.
4706+
auto *BaseIsNotNullptr = Builder.CreateIsNotNull(Ptr);
4707+
auto *ResultIsNotNullptr = Builder.CreateIsNotNull(ComputedGEP);
4708+
auto *Valid =
4709+
CGM.getLangOpts().CPlusPlus
4710+
? Builder.CreateICmpEQ(BaseIsNotNullptr, ResultIsNotNullptr)
4711+
: Builder.CreateAnd(BaseIsNotNullptr, ResultIsNotNullptr);
4712+
Checks.emplace_back(Valid, SanitizerKind::PointerOverflow);
4713+
}
4714+
4715+
if (PerformOverflowCheck) {
4716+
// The GEP is valid if:
4717+
// 1) The total offset doesn't overflow, and
4718+
// 2) The sign of the difference between the computed address and the base
4719+
// pointer matches the sign of the total offset.
4720+
llvm::Value *ValidGEP;
4721+
auto *NoOffsetOverflow = Builder.CreateNot(EvaluatedGEP.OffsetOverflows);
4722+
if (SignedIndices) {
4723+
// GEP is computed as `unsigned base + signed offset`, therefore:
4724+
// * If offset was positive, then the computed pointer can not be
4725+
// [unsigned] less than the base pointer, unless it overflowed.
4726+
// * If offset was negative, then the computed pointer can not be
4727+
// [unsigned] greater than the bas pointere, unless it overflowed.
4728+
auto *PosOrZeroValid = Builder.CreateICmpUGE(ComputedGEP, IntPtr);
4729+
auto *PosOrZeroOffset =
4730+
Builder.CreateICmpSGE(EvaluatedGEP.TotalOffset, Zero);
4731+
llvm::Value *NegValid = Builder.CreateICmpULT(ComputedGEP, IntPtr);
4732+
ValidGEP =
4733+
Builder.CreateSelect(PosOrZeroOffset, PosOrZeroValid, NegValid);
4734+
} else if (!IsSubtraction) {
4735+
// GEP is computed as `unsigned base + unsigned offset`, therefore the
4736+
// computed pointer can not be [unsigned] less than base pointer,
4737+
// unless there was an overflow.
4738+
// Equivalent to `@llvm.uadd.with.overflow(%base, %offset)`.
4739+
ValidGEP = Builder.CreateICmpUGE(ComputedGEP, IntPtr);
4740+
} else {
4741+
// GEP is computed as `unsigned base - unsigned offset`, therefore the
4742+
// computed pointer can not be [unsigned] greater than base pointer,
4743+
// unless there was an overflow.
4744+
// Equivalent to `@llvm.usub.with.overflow(%base, sub(0, %offset))`.
4745+
ValidGEP = Builder.CreateICmpULE(ComputedGEP, IntPtr);
4746+
}
4747+
ValidGEP = Builder.CreateAnd(ValidGEP, NoOffsetOverflow);
4748+
Checks.emplace_back(ValidGEP, SanitizerKind::PointerOverflow);
4749+
}
47014750

47024751
assert(!Checks.empty() && "Should have produced some checks.");
47034752

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
// RUN: %clang_cc1 -x c -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-INVALID-PTR
2+
// RUN: %clang_cc1 -x c -fno-delete-null-pointer-checks -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-VALID-PTR
3+
4+
// RUN: %clang_cc1 -x c++ -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-INVALID-PTR
5+
// RUN: %clang_cc1 -x c++ -fno-delete-null-pointer-checks -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-VALID-PTR
6+
7+
#ifdef __cplusplus
8+
extern "C" {
9+
#endif
10+
11+
// CHECK-LABEL: @baseline
12+
char *baseline(char *base, unsigned long offset) {
13+
// CHECK: call void @__ubsan_handle_pointer_overflow(
14+
return base + offset;
15+
}
16+
17+
// CHECK-LABEL: @blacklist_0
18+
__attribute__((no_sanitize("undefined"))) char *blacklist_0(char *base, unsigned long offset) {
19+
return base + offset;
20+
}
21+
22+
// CHECK-LABEL: @blacklist_1
23+
__attribute__((no_sanitize("pointer-overflow"))) char *blacklist_1(char *base, unsigned long offset) {
24+
return base + offset;
25+
}
26+
27+
// CHECK-LABEL: @ignore_non_default_address_space
28+
__attribute__((address_space(1))) char *ignore_non_default_address_space(__attribute__((address_space(1))) char *base, unsigned long offset) {
29+
return base + offset;
30+
}
31+
32+
#ifdef __cplusplus
33+
}
34+
#endif
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
// RUN: %clang_cc1 -x c -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
2+
// RUN: %clang_cc1 -x c -fsanitize=pointer-overflow -fno-sanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
3+
4+
// RUN: %clang_cc1 -x c++ -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
5+
// RUN: %clang_cc1 -x c++ -fsanitize=pointer-overflow -fno-sanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
6+
7+
#include <stdint.h>
8+
9+
#ifdef __cplusplus
10+
extern "C" {
11+
#endif
12+
13+
struct S {
14+
int x, y;
15+
};
16+
17+
// CHECK-LABEL: define i64 @{{.*}}get_offset_of_y_naively{{.*}}(
18+
uintptr_t get_offset_of_y_naively() {
19+
// CHECK: [[ENTRY:.*]]:
20+
// CHECK-NEXT: ret i64 ptrtoint (i32* getelementptr (i32, i32* null, i32 1) to i64)
21+
// CHECK-NEXT: }
22+
return ((uintptr_t)(&(((struct S *)0)->y)));
23+
}
24+
25+
// CHECK-LABEL: define i64 @{{.*}}get_offset_of_y_via_builtin{{.*}}(
26+
uintptr_t get_offset_of_y_via_builtin() {
27+
// CHECK: [[ENTRY:.*]]:
28+
// CHECK-NEXT: ret i64 4
29+
// CHECK-NEXT: }
30+
return __builtin_offsetof(struct S, y);
31+
}
32+
33+
#ifdef __cplusplus
34+
}
35+
#endif

0 commit comments

Comments
 (0)