Skip to content

Commit 2238280

Browse files
myteronandrew-costelloBartKaras1128
authored
pySCG adding CWE-182 (#847)
Adding CWE-182 doc as part of #531 --------- Signed-off-by: Helge Wehder <[email protected]> Signed-off-by: myteron <[email protected]> Co-authored-by: andrew-costello <[email protected]> Co-authored-by: Bartlomiej Karas <[email protected]>
1 parent b7eda0c commit 2238280

File tree

3 files changed

+138
-0
lines changed

3 files changed

+138
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# CWE-182: Collapse of Data into Unsafe Value
2+
3+
Handling data between different encodings or while filtering out untrusted characters and strings can cause malicious content to slip through input sanitation.
4+
5+
Encoding changes, such as changing from `UTF-8` to pure `ASCII`, can result in turning non-functional payloads, such as `<script生>`, into functional `<script>` tags. Mixed encoding modes [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize - Development Environment](../../CWE-707/CWE-180/) can also play a role. The recommendation by [Batchelder 2022](https://www.youtube.com/watch?v=sgHbC6udIqc) to use a single type of encoding and mode is only applicable for a single project or supplier. The recommendation to always choose the `UTF-8` by [W3c.org 2025](https://www.w3.org/International/questions/qa-what-is-encoding) provides no guarantee and is already flawed by Windows having `Windows-1252` encoding for some Python installations.
6+
7+
The `example01.py` is a crudely simplified version of two methods simulating two completely different systems using different encodings. We are simulating the data at rest and data in transit part in a variable named `floppy`. The write_message and read_message method would be delivered independently in a real world scenario, each with their own encoding.
8+
9+
[*example01.py:*](example01.py)
10+
11+
```py
12+
# SPDX-FileCopyrightText: OpenSSF project contributors
13+
# SPDX-License-Identifier: MIT
14+
"""Code Example"""
15+
16+
import re
17+
import unicodedata
18+
19+
20+
def write_message(input_string: str):
21+
"""Normalize and validate untrusted string before storing
22+
23+
Parameters:
24+
input_string(string): String to validate
25+
"""
26+
message = unicodedata.normalize("NFC", input_string)
27+
28+
# validate, exclude dangerous tags:
29+
for tag in re.findall("<[^>]*>", message):
30+
if tag in ["<script>", "<img", "<a href"]:
31+
raise ValueError("Invalid input tag")
32+
return message.encode("utf-8")
33+
34+
35+
def read_message(message: bytes):
36+
"""Simulating another part of the system displaying the content.
37+
38+
Args:
39+
message (bytes): bytearray with some data
40+
"""
41+
print(message.decode("ascii", "ignore"))
42+
43+
44+
#####################
45+
# attempting to exploit above code example
46+
#####################
47+
48+
# attacker:
49+
floppy = write_message("<script生>")
50+
51+
# victim:
52+
read_message(floppy)
53+
```
54+
55+
__Output of example01.py:__
56+
57+
```bash
58+
<script>
59+
```
60+
61+
The `example01.py` code reduces the `UTF-8` encoded data into `128 ASCII` subsequently turning a previously harmless string into a working `<script>` tag.
62+
63+
The `example01.py` turns a non-functional `UTF-8` encoded message `<script���>` or `<script生>` string into a working `<script>` tag after collapsing the data into `ASCII`. Such an event taking place highly depends on the client, trust relation and chain of events.
64+
65+
A compliant solution will have to adhere to at least:
66+
67+
* [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](../../CWE-707/CWE-180/)
68+
* [CWE-184: Incomplete List of Disallowed Input - Development Environment](../CWE-184/README.md)
69+
70+
Reduction of data into a subset is not limited to strings and characters.
71+
72+
## Automated Detection
73+
74+
|Tool|Version|Checker|Description|
75+
|:---|:---|:---|:---|
76+
|Bandit|1.7.4 on Python 3.10.4|Not Available||
77+
|Flake8|8-4.0.1 on Python 3.10.4|Not Available||
78+
79+
## Related Guidelines
80+
81+
|||
82+
|:---|:---|
83+
|[MITRE CWE](http://cwe.mitre.org/)|Pillar: CWE-693, Protection Mechanism Failure \[online\], available from <https://cwe.mitre.org/data/definitions/693.html> \[Accessed April 2025\]|
84+
|[MITRE CWE](http://cwe.mitre.org/)|Base: CWE-182: Collapse of Data into Unsafe Value \[online\], available from <https://cwe.mitre.org/data/definitions/182.html> \[Accessed April 2025\]|
85+
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|IDS11-J. Perform any string modifications before validation\[online\], available from: <https://wiki.sei.cmu.edu/confluence/display/java/IDS11-J.+Perform+any+string+modifications+before+validation> \[Accessed April 2025\]|
86+
|[OpenSSF Secure Coding in Python](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python)|CWE-180: Incorrect Behavior Order: Validate Before Canonicalize \[online\], available from <https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180> \[Accessed April 2025\]|
87+
|[OpenSSF Secure Coding in Python](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python)|CWE-184: Incomplete List of Disallowed Input \[online\], available from <https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md> \[Accessed April 2025\]|
88+
89+
## Bibliography
90+
91+
|||
92+
|:---|:---|
93+
|\[Batchelder 2022\]|Ned Batchelder, Pragmatic Unicode, or, How do I stop the pain? \[online\], Available from: <https://www.youtube.com/watch?v=sgHbC6udIqc> \[Accessed 4 April 2025\] |
94+
|\[W3c.org 2015\]|Character encodings for beginners \[online\], Available from: <https://www.w3.org/International/questions/qa-what-is-encoding>, \[Accessed 4 April 2025\] |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# SPDX-FileCopyrightText: OpenSSF project contributors
2+
# SPDX-License-Identifier: MIT
3+
"""Code Example"""
4+
5+
import re
6+
import unicodedata
7+
8+
9+
def write_message(input_string: str):
10+
"""Normalize and validate untrusted string before storing
11+
12+
Parameters:
13+
input_string(string): String to validate
14+
"""
15+
message = unicodedata.normalize("NFC", input_string)
16+
17+
# validate, exclude dangerous tags:
18+
for tag in re.findall("<[^>]*>", message):
19+
if tag in ["<script>", "<img", "<a href"]:
20+
raise ValueError("Invalid input tag")
21+
return message.encode("utf-8")
22+
23+
24+
def read_message(message: bytes):
25+
"""Simulating another part of the system displaying the content.
26+
27+
Args:
28+
message (bytes): bytearray with some data
29+
"""
30+
print(message.decode("ascii", "ignore"))
31+
32+
33+
#####################
34+
# attempting to exploit above code example
35+
#####################
36+
37+
# attacker:
38+
floppy = write_message("<script生>")
39+
40+
# victim:
41+
read_message(floppy)
42+

docs/Secure-Coding-Guide-for-Python/readme.md

+2
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@ It is __not production code__ and requires code-style or python best practices t
7474
|[CWE-617: Reachable Assertion](CWE-691/CWE-617/README.md)||
7575

7676
|[CWE-693: Protection Mechanism Failure](https://cwe.mitre.org/data/definitions/693.html)|Prominent CVE|
77+
|:---------------------------------------------------------------------------------------------------------------|:----|
78+
|[CWE-182: Collapse of Data into Unsafe Value](CWE-693/CWE-182/README.md)||
7779
|[CWE-184: Incomplete List of Disallowed Input](CWE-693/CWE-184/README.md)||
7880
|[CWE-330: Use of Insufficiently Random Values](CWE-693/CWE-330/README.md)|[CVE-2020-7548](https://www.cvedetails.com/cve/CVE-2020-7548),<br/>CVSSv3.1: __9.8__,<br/>EPSS: __0.22__ (12.12.2024)|
7981
|[CWE-778: Insufficient Logging](CWE-693/CWE-778/README.md)||

0 commit comments

Comments
 (0)