Skip to content

Simplified session #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3c0139c
Major README updates
hohn May 9, 2023
b1db4ec
Add explicit workshop/session support
hohn May 9, 2023
c2d0120
Build database from session support file
hohn May 9, 2023
118abc6
clarify README
hohn May 9, 2023
30e5f43
tree depth change
hohn May 9, 2023
dafd5cc
change to session-based workshop
hohn May 10, 2023
079b7d3
Tangled (extracted) the embedded examples into separate files
hohn May 12, 2023
1540e66
Add the test cases for the session examples
hohn May 12, 2023
ded9d34
Include session examples rather than inlining. Clear up some session…
hohn May 16, 2023
5e1ad9e
Add WIP template for simplification
May 16, 2023
6695f07
Expanded the "Session/Workshop notes" section
hohn May 16, 2023
22126ed
Introduce some predicates at step 4a
hohn May 17, 2023
b444185
Further clarify the goals in "Session/Workshop notes"
hohn May 17, 2023
71153b2
WIP: Update example5. List predicates to be introduced
hohn May 17, 2023
2d30b85
Update example 6
hohn May 17, 2023
8cea58a
Update example7. Prepare to introduce more predicates
hohn May 17, 2023
e790a3c
Added interim step 7a
hohn May 17, 2023
da7781a
Added interim step 7a, detailing some predicate derivation
hohn May 18, 2023
bedd049
Include size comparison in example7b, from 8
hohn May 19, 2023
841a33b
Add session snapshot script
hohn May 20, 2023
c04eebe
Update step 8
hohn May 20, 2023
5a40e93
Clarified the alloc and buffer access check intention
hohn May 22, 2023
646c70f
step 8a: Find /some/ problematic accesses by reverting to simple var+…
hohn May 22, 2023
ea064e5
More on step 9, global value numbering
hohn May 23, 2023
903b752
export to plain markdown to fix github rendering
hohn May 23, 2023
e4e2995
Apply the solution/5 results format to all steps
hohn May 23, 2023
1c733ef
Introduce hashconsing in step 9a to get correct equality comparison
hohn May 23, 2023
c59d8bf
A short note on the structure of directories and their use
hohn May 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
.cache
*.html
*.testproj
*.lock.yaml
*.lock.yml
*.actual
cpp-runtime-values-db
exercises-tests/*/test.c
settings.json
.DS_Store
.DS_Store
86 changes: 81 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,30 @@ This workshop is based on a significantly simplified and modified version of the

## Setup Instructions
- Install [Visual Studio Code](https://code.visualstudio.com/).

- Install the [CodeQL extension for Visual Studio Code](https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/).

- Install the latest version of the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases).

- Clone this repository:
```bash
git clone https://github.com/kraiouchkine/codeql-workshop-runtime-values-c
```
- Install the CodeQL pack dependencies using the command `CodeQL: Install Pack Dependencies` and select `exercises`, `solutions`, `exercises-tests`, and `solutions-tests` from the list of packs.
- If you have CodeQL on your PATH, build the database using `build-database.sh` and load the database with the VS Code CodeQL extension.

- Install the CodeQL pack dependencies using the command `CodeQL: Install Pack
Dependencies` and select `exercises`, `solutions`, `exercises-tests`, `session`,
`session-db` and `solutions-tests` from the list of packs.

- If you have CodeQL on your PATH, build the database using `build-database.sh`
and load the database with the VS Code CodeQL extension. It is at
`session-db/cpp-runtime-values-db`.
- Alternatively, you can download [this pre-built database](https://drive.google.com/file/d/1N8TYJ6f4E33e6wuyorWHZHVCHBZy8Bhb/view?usp=sharing).

- If you do **not** have CodeQL on your PATH, build the database using the unit
test sytem. Choose the `TESTING` tab in VS Code, run the
`session-db/DB/db.qlref` test. The test will fail, but it leaves a usable CodeQL
database in `session-db/DB/DB.testproj`.

- :exclamation:Important:exclamation:: Run `initialize-qltests.sh` to initialize the tests. Otherwise, you will not be able to run the QLTests in `exercises-tests`.

## Introduction
Expand Down Expand Up @@ -63,20 +78,37 @@ This workshop is not intended to be a complete analysis that is useful for real-

The goal of this workshop is rather to demonstrate the building blocks of analyzing run-time values and how to apply those building blocks to modelling a common class of vulnerability. A more comprehensive and production-appropriate example is the [OutOfBounds.qll library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the [CodeQL Coding Standards repository](https://github.com/github/codeql-coding-standards).

## Exercises
## Session/Workshop notes
Unlike the the [exercises](#org3b74422) which use the *collection* of test
problems in `exercises-test`, a workshop follows `session/session.ql` and uses a
*single* database built from a single, larger segment of code.

<a id="org3b74422"></a>
## Exercises
These exercises use the collection of test problems in `exercises-test`.

### Exercise 1
In the first exercise we are going to start by modelling a dynamic allocation with `malloc` and an access to that allocated buffer with an array expression. The goal of this exercise is to then output the array access, buffer, array size, and buffer offset.

The [first test-case](solutions-tests/Exercise1/test.c) is a simple one, as both the allocation size and array offsets are constants.

For this exercise, connect the allocation(s), the array accesses, and the sizes in each.

Run the query and ensure that you have three results.

#### Hints
1. `Expr::getValue()::toInt()` can be used to get the integer value of a constant expression.
2. Use `DataFlow::localExprFlow()` to relate the allocated buffer to the array base.

### Exercise 2
This exercise uses the same C source code, duplicated for the test [here](solutions-tests/Exercise2/test.c).
This exercise uses the same C source code with an addition: a constant array size
propagated [via a variable](solutions-tests/Exercise2/test.c).

Hints:
1. start with plain `from...where...select` query.
2. use
`elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()`
2. convert your query to predicate or use classes as outlined below, if desired.

#### Task 1
With the basic elements of the analysis in place, refactor the query into two classes: `AllocationCall` and `ArrayAccess`. The `AllocationCall` class should model a call to `malloc` and the `ArrayAccess` class should model an array access expression (`ArrayExpr`).
Expand All @@ -90,6 +122,8 @@ Use local data-flow analysis to complete the `getSourceConstantExpr` predicate.
### Exercise 3
This exercise has slightly more C source code [here](solutions-tests/Exercise3/test.c).

Note: the `test_const_branch` has `buf[100]` with size == 100

Running the query from Exercise 2 against the database yields a significant number of missing or incorrect results. The reason is that although great at identifying compile-time constants and their use, data-flow analysis is not always the right tool for identifying the *range* of values an `Expr` might have, particularly when multiple potential constants might flow to an `Expr`.

The CodeQL standard library several mechanisms for addressing this problem; in the remainder of this workshop we will explore two of them: `SimpleRangeAnalysis` and, later, `GlobalValueNumbering`.
Expand All @@ -110,6 +144,12 @@ Implement the `isOffsetOutOfBoundsConstant` predicate to check if the array offs
You should now have five results in the test (six in the built database).

### Exercise 4
Note: We *could* evolve this code to handle `size`s inside conditionals using
guards on data flow. But this amounts to implementing a small interpreter.
The range analysis library already handles conditional branches; we don't
have to use guards on data flow -- don't implement your own interpreter if you can
use the library.

Again, a slight longer C [source snippet](solutions-tests/Exercise4/test.c).

A common issue with the `SimpleRangeAnalysis` library is handling of cases where the bounds are undeterminable at compile-time on one or more paths. For example, even though certain branches have clearly defined bounds, the range analysis library will define the `upperBound` and `lowerBound` of `val` as `INT_MIN` and `INT_MAX` respectively:
Expand Down Expand Up @@ -164,4 +204,40 @@ Do not compute the GVN of the entire array index expression; use the base of an

Exclude duplicate results by only reporting `isOffsetOutOfBoundsGVN` for `access`/`source` pairs that are not already reported by `isOffsetOutOfBoundsConstant`.

You should now see thirteen results.
You should now see thirteen results.

Some notes:

Global value numbering only knows that runtime values are equal; they are not
comparable (`<, >, <=` etc.), and the *actual* value is not known.
Reference: https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/


In the query, look for and use *relative* values between allocation and use. To
do this, use GVN.
This is the case in

void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz)
{
char *buf = malloc(sz * x * y);
buf[sz * x * y - 1]; // COMPLIANT
buf[sz * x * y]; // NON_COMPLIANT
buf[sz * x * y + 1]; // NON_COMPLIANT
}

Range analyis won't bound `sz * x * y`, so switch to global value numbering.
<!-- Or use hashcons. -->
Global value numbering finds expressions that are known to have the same runtime
value, independent of structure. To get the Global Value Number in CodeQL:

...
globalValueNumber(e) = globalValueNumber(sizeExpr) and
e != sizeExpr
...

We can use global value numbering to identify common values as first step, but for
expressions like

buf[sz * x * y - 1]; // COMPLIANT

we have to "evaluate" the expressions -- or at least bound them.
7 changes: 4 additions & 3 deletions build-database.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
SRCDIR=$(pwd)/solutions-tests/Exercise6
DB=$(pwd)/cpp-runtime-values-db
codeql database create --language=cpp -s "$SRCDIR" -j 8 -v $DB --command="clang -fsyntax-only -Wno-unused-value $SRCDIR/test.c"
#!/bin/bash
SRCDIR=$(pwd)/session-db
DB=$SRCDIR/cpp-runtime-values-db
codeql database create --language=cpp -s "$SRCDIR" -j 8 -v $DB --command="clang -fsyntax-only -Wno-unused-value $SRCDIR/DB/db.c"
3 changes: 2 additions & 1 deletion codeql-workshop-runtime-values-c.code-workspace
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
}
],
"settings": {
"sarif-viewer.connectToGithubCodeScanning": "off"
"sarif-viewer.connectToGithubCodeScanning": "off",
"codeQL.runningQueries.autoSave": true
}
}
16 changes: 16 additions & 0 deletions library/RuntimeValues.qll
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import cpp

bindingset[expr]
int getExprOffsetValue(Expr expr, Expr base) {
result = expr.(AddExpr).getRightOperand().getValue().toInt() and
base = expr.(AddExpr).getLeftOperand()
or
result = -expr.(SubExpr).getRightOperand().getValue().toInt() and
base = expr.(SubExpr).getLeftOperand()
or
// currently only AddExpr and SubExpr are supported: else, fall-back to 0
not expr instanceof AddExpr and
not expr instanceof SubExpr and
base = expr and
result = 0
}
6 changes: 6 additions & 0 deletions library/qlpack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
library: true
name: library
version: 0.0.1
dependencies:
codeql/cpp-all: 0.6.1
85 changes: 85 additions & 0 deletions session-db/DB/db.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
void *malloc(unsigned long);/* clang compatible */

unsigned long extern_get_size(void);

void test_const(void)
{
char *buf = malloc(100);
buf[0]; // COMPLIANT
buf[99]; // COMPLIANT
buf[100]; // NON_COMPLIANT
}

void test_const_var(void)
{
unsigned long size = 100;
char *buf = malloc(size);
buf[0]; // COMPLIANT
buf[99]; // COMPLIANT
buf[size - 1]; // COMPLIANT
buf[100]; // NON_COMPLIANT
buf[size]; // NON_COMPLIANT
}

void test_const_branch(int mode, int random_condition)
{
unsigned long size = (mode == 1 ? 100 : 200);

char *buf = malloc(size);

if (random_condition)
{
size = 300;
}

buf[0]; // COMPLIANT
buf[99]; // COMPLIANT
buf[size - 1]; // NON_COMPLIANT
buf[100]; // NON_COMPLIANT[DONT REPORT]
buf[size]; // NON_COMPLIANT

if (size < 199)
{
buf[size]; // COMPLIANT
buf[size + 1]; // COMPLIANT
buf[size + 2]; // NON_COMPLIANT
}
}

void test_const_branch2(int mode)
{
unsigned long alloc_size = 0;

if (mode == 1)
{
alloc_size = 200;
}
else
{
// unknown const size - don't report accesses
alloc_size = extern_get_size();
}

char *buf = malloc(alloc_size);

buf[0]; // COMPLIANT
buf[100]; // COMPLIANT
buf[200]; // NON_COMPLIANT
buf[alloc_size - 1]; // COMPLIANT
buf[alloc_size]; // NON_COMPLIANT

if (alloc_size < 199)
{
buf[alloc_size]; // COMPLIANT
buf[alloc_size + 1]; // COMPLIANT
buf[alloc_size + 2]; // NON_COMPLIANT
}
}

void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz)
{
char *buf = malloc(sz * x * y);
buf[sz * x * y - 1]; // COMPLIANT
buf[sz * x * y]; // NON_COMPLIANT
buf[sz * x * y + 1]; // NON_COMPLIANT
}
3 changes: 3 additions & 0 deletions session-db/DB/db.expected
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
| test.c:4:5:4:10 | access to array | 0 | test.c:3:17:3:22 | call to malloc | 100 |
| test.c:5:5:5:11 | access to array | 99 | test.c:3:17:3:22 | call to malloc | 100 |
| test.c:6:5:6:12 | access to array | 100 | test.c:3:17:3:22 | call to malloc | 100 |
1 change: 1 addition & 0 deletions session-db/DB/db.qlref
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
session.ql
8 changes: 8 additions & 0 deletions session-db/qlpack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
library: false
name: session-db
version: 0.0.1
dependencies:
"session": "*"
extractor: cpp
tests: .
12 changes: 12 additions & 0 deletions session-tests/Example1/example1.expected
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
| test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 99 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:17:5:17:10 | access to array | 0 | test.c:17:9:17:9 | 0 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:18:5:18:11 | access to array | 99 | test.c:18:9:18:10 | 99 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:20:5:20:12 | access to array | 100 | test.c:20:9:20:11 | 100 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:35:5:35:10 | access to array | 0 | test.c:35:9:35:9 | 0 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:36:5:36:11 | access to array | 99 | test.c:36:9:36:10 | 99 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | test.c:38:9:38:11 | 100 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:65:5:65:10 | access to array | 0 | test.c:65:9:65:9 | 0 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:66:5:66:12 | access to array | 100 | test.c:66:9:66:11 | 100 | 100 | test.c:7:24:7:26 | 100 |
| test.c:7:17:7:22 | call to malloc | test.c:67:5:67:12 | access to array | 200 | test.c:67:9:67:11 | 200 | 100 | test.c:7:24:7:26 | 100 |
1 change: 1 addition & 0 deletions session-tests/Example1/example1.qlref
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
example1.ql
85 changes: 85 additions & 0 deletions session-tests/Example1/test.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
void *malloc(unsigned long);/* clang compatible */

unsigned long extern_get_size(void);

void test_const(void)
{
char *buf = malloc(100);
buf[0]; // COMPLIANT
buf[99]; // COMPLIANT
buf[100]; // NON_COMPLIANT
}

void test_const_var(void)
{
unsigned long size = 100;
char *buf = malloc(size);
buf[0]; // COMPLIANT
buf[99]; // COMPLIANT
buf[size - 1]; // COMPLIANT
buf[100]; // NON_COMPLIANT
buf[size]; // NON_COMPLIANT
}

void test_const_branch(int mode, int random_condition)
{
unsigned long size = (mode == 1 ? 100 : 200);

char *buf = malloc(size);

if (random_condition)
{
size = 300;
}

buf[0]; // COMPLIANT
buf[99]; // COMPLIANT
buf[size - 1]; // NON_COMPLIANT
buf[100]; // NON_COMPLIANT[DONT REPORT]
buf[size]; // NON_COMPLIANT

if (size < 199)
{
buf[size]; // COMPLIANT
buf[size + 1]; // COMPLIANT
buf[size + 2]; // NON_COMPLIANT
}
}

void test_const_branch2(int mode)
{
unsigned long alloc_size = 0;

if (mode == 1)
{
alloc_size = 200;
}
else
{
// unknown const size - don't report accesses
alloc_size = extern_get_size();
}

char *buf = malloc(alloc_size);

buf[0]; // COMPLIANT
buf[100]; // COMPLIANT
buf[200]; // NON_COMPLIANT
buf[alloc_size - 1]; // COMPLIANT
buf[alloc_size]; // NON_COMPLIANT

if (alloc_size < 199)
{
buf[alloc_size]; // COMPLIANT
buf[alloc_size + 1]; // COMPLIANT
buf[alloc_size + 2]; // NON_COMPLIANT
}
}

void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz)
{
char *buf = malloc(sz * x * y);
buf[sz * x * y - 1]; // COMPLIANT
buf[sz * x * y]; // NON_COMPLIANT
buf[sz * x * y + 1]; // NON_COMPLIANT
}
Loading